Virtual inheritance overhead in g++

By now every C++ engineer worth her salt knows that virtual inheritance is not free. It has object code, runtime (both CPU and memory), as well as compilation time and memory overheads (for an in-depth discussion on how virtual inheritance is implemented in C++ compilers see “Inside the C++ Object Model” by Stanley Lippman). In this post I would like to consider the object code as well as compilation time and memory overheads since in modern C++ implementations these are normally sacrificed for the runtime speed and can present major surprises. Unlike existing studies on this subject, I won’t bore you with “academic” metrics such as per class or per virtual function overhead or synthetic tests. Such metrics and tests have two main problems: they don’t give a feeling of the overhead experienced by real-world applications and they don’t factor in the extra code necessary to account for the lack of functionality otherwise provided by virtual inheritance.

It is hard to come by non-trivial applications that can provide the same functionality with and without virtual inheritance. I happened to have access to such an application and what follows is a quick description of the problem virtual inheritance was used to solve. I will then present some measurements of the overhead by comparing to the same functionality implemented without virtual inheritance.

The application in question is XSD/e, validating XML parser/serializer generator for embedded systems. Given a definition of an XML vocabulary in XML Schema it generates a parser skeleton (C++ class) for each type defined in that vocabulary. Types in XML Schema can derive from each other and if two types are related by inheritance then it is often desirable to be able to reuse the base parser implementation in the derived one. To support this requirement, the current implementation of XSD/e uses the C++ mixin idiom that relies on virtual inheritance:

// Parser skeletons. Generated by XSD/e.
//
struct base
{
  virtual void
  foo () = 0;
};
 
struct derived: virtual base
{
  virtual void
  bar () = 0;
};
 
// Parser implementations. Hand-written.
//
struct base_impl: virtual base
{
  virtual void
  foo ()
  {
    ...
  }
};
 
struct derived_impl: virtual derived,
                     base_impl
{
  virtual void
  bar ()
  {
    ...
  }
};

This approach works well but we quickly found out that for large vocabularies with hundreds of types the resulting object code produced by g++ was unacceptably large. Furthermore, on a schema with a little more than a thousand types, g++ with optimization turned on (-O2) runs out of memory on a machine with 2GB of RAM.

After some analysis we determined that virtual inheritance was to blame. To resolve this problem we have developed an alternative, delegation-based implementation reuse method (will appear in the next release of XSD/e) that is almost as convenient to use as mixin (this is the case because all the support code is automatically generated by the XSD/e compiler). The idea behind the delegation-based approach is illustrated in the following code fragment:

// Parser skeletons. Generated by XSD/e.
//
struct base
{
  virtual void
  foo () = 0;
};
 
struct derived: base
{
  derived (base* impl)
    : impl_ (impl)
  {
  }
 
  virtual void
  bar () = 0;
 
  virtual void
  foo ()
  {
    assert (impl_);
    impl_->foo ();
  }
 
private:
  base* impl_;
};
 
// Parser implementations. Hand-written.
//
struct base_impl: base
{
  virtual void
  foo ()
  {
    ...
  }
};
 
struct derived_impl: derived
{
  derived_impl ()
    : derived (&base_impl_)
  {
  }
 
  virtual void
  bar ()
  {
    ...
  }
 
private:
  base_impl base_impl_;
};

The optimized for size (-Os) and stripped test executable built for the above-mentioned thousand-types schema using virtual inheritance is 15MB in size. It also takes 19 minutes to build and peak memory usage of the C++ compiler is 1.6GB. For comparison, the same executable built using the delegation-based approach is 3.7MB in size, takes 14 minutes to build, and peak memory usage is 348MB. That’s right, the executable is 4 times smaller. Note also that the generated parser skeletons are not just a bunch of pure virtual function signatures. They include XML Schema validation, data conversion, and dispatch code. The measurements also showed that the runtime performance of the two reuse approaches is about the same (most likely because g++ performs a similar delegation under the hood except that it has to handle all possible use-cases thus the object code overhead).

This entry was posted on Thursday, April 17th, 2008 at 4:00 am and is filed under GCC g++, C++ Compilers, C++. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

5 Responses to “Virtual inheritance overhead in g++”

Robert 'Groby' Blum Says:
April 17th, 2008 at 12:59 pm
Hm. That has me curious to play with GCC to see what actually happens. One problem that I can spot right away in your original implementation is that derived_impl is derived *twice* from base. Once through derived, once through base_impl

That means whenever you call foo() on a derived_impl, gcc needs to disambiguate between base::foo() and base_impl::foo(). (Which is a tricky problem )

Your second solution does the disambiguation for the compiler - it specfies that a call to derived_impl can only use derived::foo, which will automatically forward to base_impl::foo

I’m surprised that the difference in memory usage and compile time is that big, though.

If you have a stripped sample project, I’d like to run it on various other compilers….
Paolo Bonzini Says:
April 17th, 2008 at 2:07 pm
Can you report these two testcases to the GCC bugzilla (and CC me, bonzini@gnu.org on the testcases). Chances are that the build times and memory usages can be improved a lot.
Cam Says:
April 17th, 2008 at 11:43 pm
@Groby
Isn’t the point of the first example using virtual inheritance because something inherits from a base class more than once (aka the dreaded diamond)?. Or am I missing something?
boris Says:
April 18th, 2008 at 5:07 am
Robert, Paolo,

It will be hard to come up with a stand-alone test case since the generated code depends on the runtime library. But it is all open-source and the schema is publicly available so you will be able to reproduce this once the next version of XSD/e with support for delegation-based reuse is out. I can also provide makefiles, option files, etc. Let me know if you are interested.
Robert 'Groby' Blum Says:
April 18th, 2008 at 1:09 pm
@boris: Absolutely - I’d appreciate a quick e-mail when I can try this myself. I’m doing a bit of research into compilation speeds of C++, so I’d love to have sample cases,

@Cam: Even though it’s virtual, it’s still a diamond. derived_impl still can call either derived::foo() or base_impl::foo(). Yes, derived::foo() is abstract - but at every invokation of foo() the compiler needs to actually find out which one you meant.

I do think that memory usage and compile time are excessive, but I’m not surprised the second example is simpler. I’d love to see the gcc team’s take on this….