Archive for the ‘GCC g++’ Category

GCC can now be built with a C++ compiler

Tuesday, May 8th, 2012

You probably heard about the decision to allow the use of C++ in GCC itself. But it is one thing to say this and completely different to actually making a large code base like GCC to even compile with a C++ compiler instead of C. Well, GCC 4.7 got one step closer to this and can now be compiled with either a C or C++ compiler. Starting with 4.8, it is planned to build GCC in the C++ mode by default. Here is the C++ Build Status page for GCC 4.8 on various targets.

To enable the C++ mode in GCC 4.7, we use the --enable-build-with-cxx GCC configure option. As one would expect, different distributions made different decisions about how to build GCC 4.7. For example, Debian and Ubuntu use C++ while Arch Linux uses C. These differences are not visible to a typical GCC user which is why neither the GCC 4.7 release notes nor the distributions mention any of this. In fact, I didn’t know about the new C++ build mode until ODB, which is implemented as a GCC plugin, mysteriously failed to load with GCC 4.7. This “war story” is actually quite interesting so I am going to tell it below. At the end I will also discuss some implications of this change for GCC plugin development.

But first a quick recap on the GCC plugin architecture: GCC plugin is a shared object (.so) that is dynamically-loaded using the dlopen()/dlsym() API. As you may already know, with such dynamically-loaded shared objects, symbol exporting can work both ways: the executable can use symbols from the shared object and the shared object can use symbols from the executable, provided this executable was built with the -rdynamic option in order to export its symbols. This back-exporting (from executable to shared object) is quite common in GCC plugins since to do anything useful a plugin will most likely need to call some GCC functions.

Ok, so I built ODB with GCC 4.7 and tried to run it for the first time. The error I got looked like this:

 
cc1plus: error: cannot load plugin odb.so
odb.so: undefined symbol: instantiate_decl
 

Since the same code worked fine with GCC 4.5 and 4.6, my first thought was that in GCC 4.7 instantiate_decl() was removed, renamed, or made static. So I downloaded GCC source code and looked for instantiate_decl(). Nope, the function was there, the signature was unchanged, and it was still extern.

My next guess was that building GCC itself with the -rdynamic option was somehow botched in 4.7. So I grabbed Debian build logs (this is all happening on a Debian box with Debian-packaged GCC 4.7.0) and examined the configure output. Nope, -rdynamic was passed as before.

This was getting weirder and weirder. Running out of ideas, I decided to examine the list of symbols that are in fact exported by cc1plus (this is the actual C++ compiler; g++ is just a compiler driver). Note that these are not the normal symbols which we see when we run nm (and which can be stripped). These symbols come from the dynamic symbol table and we need to use the -D|--dynamic nm option to see them:

 
$ nm -D /usr/lib/gcc/x86_64-linux-gnu/4.7.0/cc1plus | 
grep instantiate_decl
0000000000529c50 T _Z16instantiate_declP9tree_nodeib
 

Wait a second. This looks a lot like a mangled C++ name. Sure enough:

 
nm -D -C /usr/lib/gcc/x86_64-linux-gnu/4.7.0/cc1plus | 
grep instantiate_decl
0000000000529c50 T instantiate_decl(tree_node*, int, bool)
 

I then ran nm without grep and saw that all the text symbols are mangled. Then it hit me: GCC is now built with a C++ compiler!

Seeing that the ODB plugin is written in C++, you may be wondering why did it still reference instantiate_decl() as a C function? Prior to 4.7, GCC headers that a plugin had to include weren’t C++-aware. As a result, I had to wrap them in the extern "C" block. Because GCC 4.7 can be built either in C or C++ mode, that extern "C" block is only necessary in the former case. Luckily, the config.h GCC plugin header defines the ENABLE_BUILD_WITH_CXX macro which we can use to decide how we should include the rest of the GCC headers:

 
#include <config.h>
 
#ifndef ENABLE_BUILD_WITH_CXX
extern "C"
{
#endif
 
...
 
#ifndef ENABLE_BUILD_WITH_CXX
} // extern "C"
#endif
 

There is also an interesting implication of this switch to the C++ mode for GCC plugin writers. In order to work with GCC 4.7, a plugin will have to be compiled with a C++ compiler even if it is written in C. Once the GCC developers actually start using C++ in the GCC source code, it won’t be possible to write a plugin in C anymore.

Who calls this function?

Wednesday, February 29th, 2012

Let’s say we have a large project and we want to find out from which places in our code a particular function is called. You may be wondering why would you want to know? The most common reason is to eliminate dead code; if the function is not called, then it may not be needed anymore. Or maybe you just want to refresh your memory about this area of your code base. The case that triggered this post involved changing one function call to another. I was adding support for composite object ids in ODB and was gradually changing the code to use a more generalized version of a certain function while still maintaining the old version for the code still to be ported. While I knew about most of the areas that needed changing, in the end I needed to verify that nobody was calling the old function and then remove it.

So how do we find out who calls a particular function? The method that I am sure most of you have used before is to comment the function out, recompile, and use the C++ compiler error messages to pin-point the calls. There are a few problems with this approach, however. First of all, depending on your build system, the compilation may stop before showing you all the call sites (make -k is helpful here but is still not a bulletproof solution). So to make sure that you have seen all the places, you may also have to keep commenting the calls and recompiling until you get no more errors. This is annoying.

This approach will also not work if a call can be resolved to one of the overloaded versions. This was exactly the situation I encountered. I had two functions that looked like this:

class traverser
{
  void traverse (type&);   // New version.
  void traverse (class_&); // Old version.
};

Where class_ derives from type so if I commented the old version out, the calls were happily resolved to the new version without giving any errors.

Another similar situation is when you have a function in the outer namespace that will be used if you comment a function in the inner namespace:

void f ();
 
namespace n
{
  void f ();
 
  void g ()
  {
    // Will resolve to outer f() if inner f() is
    // commented out.
    //
    f ();
  }
}

What’s worse is that in complex cases involving implicit conversions of arguments, some calls may be successfully resolved to an overloaded or outer version while some will trigger an error. As a result, you may not even realize that you didn’t see all the call sites.

Ok, so that approach didn’t work in my case. What else can we try? Another option is to just comment the definition of the function out and see if we get any unresolved symbol errors during linking. There are many problems with this method as well. First of all, if the function in question is virtual, then this method won’t work because the virtual function table will always contain a reference to the function. Plus, all the calls to this function will go through the vtable.

If the function is not virtual, then, at best, a linker will tell you that there is an undefined reference in a specific function in a specific translation unit. For example, here is an output from the GNU Binutils ld:

/tmp/ccXez0jI.o: In function `main':
test.cxx:(.text+0×10): undefined reference to `f()'
test.cxx:(.text+0×15): undefined reference to `f()'

In particular, there will be no line information so if a function calls the function of interest multiple times, we will have no way of knowing which call triggered the undefined symbol.

This approach also won’t work if we are building a shared library (unless we are using the -no-undefined or equivalent option) because the undefined reference won’t be reported until we link the library to an executable or try to load it (e.g., with dlopen()). And when that happens all we will get is just a note that there is an undefined reference in a library:

libtest.so: undefined reference to `f()'

In my case, since ODB is implemented as a shared library, all this method did was to confirm that I still had a call to the old version of the function. I, however, had no idea even which file(s) contained these calls.

As it happens, just the day before I was testing ODB with GCC in the C++11 mode. While everything worked fine, I got a few warnings about std::auto_ptr being deprecated. As I saw them scrolling by, I made an idle note to myself that when compiled in the C++11 mode libstdc++ probably marks auto_ptr using the GCC deprecated attribute. A day later this background note went off like a light bulb in my head: I can mark the old version of the function as deprecated and GCC will pin-point with a warning every single place where this function is called:

class traverser
{
  void traverse (type&);
 
  void traverse (class_&) __attribute__ ((deprecated));
};

And the diagnostics is:

model.cxx: In function ‘void object_columns::traverse(data_member&)’:
model.cxx:22:9: warning: ‘void traverser::traverse(class_&)’ is
deprecated

This method is also very handy to find out which overloaded version was selected by the compiler without resolving to the runtime test:

void f (bool) __attribute__ ((deprecated));
void f (int) __attribute__ ((deprecated));
void f (double) __attribute__ ((deprecated));
 
void g ()
{
  f (true);
  f (123);
  f (123.1);
}

And the output is:

test.cxx:7:10: warning: ‘void f(bool)’ is deprecated
test.cxx:8:9: warning: ‘void f(int)’ is deprecated
test.cxx:9:11: warning: ‘void f(double)’ is deprecated

The obvious drawback of this method is that it relies on a GCC-specific extension, though some other compilers (Clang and probably Intel C++ for Linux) also support it. If you know of a similar functionality in other compilers and/or IDE’s, please mention it in the comments.

Do we need std::buffer?

Tuesday, August 9th, 2011

Or, boost::buffer for starters?

A few days ago I was again wishing that there was a standard memory buffer abstraction in C++. I have already had to invent my own classes for XSD and XSD/e (XML Schema to C++ compilers) where they are used for mapping the XML Schema hexBinary and base64Binary types to C++. Now I have the same problem in ODB (an ORM system for C++) where I need a suitable C++ type for representing database BLOB types. This time I have decided against creating another copy of my own buffer class and instead use the poor man’s “standard” buffer, std::vector<char>, with its unnatural interface and all.

The abstraction I am wishing for is a simple class for encapsulating the memory management of a raw memory buffer plus providing a few common operations, such as memcpy, memset, etc. So instead of writing this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_size_ (key_size)
  {
    key_data_ = new char[key_size];
    std::memcpy (key_data_, key_data, key_size);
  }
 
  ~person ()
  {
    delete key_data_;
  }
 
  ...
 
  char* key_data_;
  std::size_t key_size_;
};

Or having to create yet another custom buffer class, we could do this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_ (key_data, key_size)
  {
  }
 
  ...
 
  std::buffer key_;
};

Above I called vector<char> a poor man’s “standard” buffer. But what exactly is wrong with using it to manage a memory buffer? While it works reasonably well functionally, the interface is unnatural and some operations may not be as efficient as we would expect from a memory buffer. Let’s examine the most prominent examples of these issues.

The first problem is with how we access the underlying memory. The C++ standard defect report (DR) 464 added the data() member function to std::vector which returns a pointer to the buffer. However, there are still compilers in use that do not support this, notably GCC 3.4 and VC++ 2008/9.0. As a result, if you want your code to be portable, you will need to use the much less intuitive &b.front() expression:

vector<char> b = ...
memcpy (out, &b.front (), b.size ());

There is also a subtle issue with using front(). While it appears to be legal to call data() on an empty buffer (as long as we don’t dereference the returned pointer), it is illegal to call front(). This means that you may have to handle an empty buffer as a special case, further complicating your code:

vector<char> b = ...
memcpy (out, (b.empty () ? 0 : &b.front ()), b.size ());

The initialization of a buffer is also inconvenient and potentially inefficient. Let’s say we want to have an uninitialized buffer of 1024 bytes which we plan to fill in later. There is no way to do that with vector<char>. The best we can do is to have every byte initialized:

vector<char> b (1024); // Zero-initialized buffer.

If we want to create a buffer initialized with contents of a memory fragment, the interface we have to use is cumbersome:

vector<char> b (data, data + size);

What we want to write instead is this:

buffer b (data, size);

This initialization is also potentially inefficient. Depending on the quality of the implementation, std::vector may end up using a for loop instead of memcpy to copy the data. In fact, that’s exactly how it is done in GCC 4.5 and VC++ 2010/10.0 (Correction: as was pointed out in the comments, both GCC 4.5 and VC++ 10 optimize the case where the vector element is POD).

So I think it is quite clear that while vector<char> is workable, it is not particularly convenient or efficient.

Also, as it turns out this is not the first time I am playing with the idea of a dedicated buffer class in C++. A couple of months ago I started a thread on the Boost developer mailing list trying to see if there would be any interest in a simple buffer library in Boost. The result wasn’t very encouraging. The thread quickly splintered into discussions of various special-purpose, buffer-like data structures that people have in their applications.

On the other hand, I mentioned the buffer class at BoostCon 2011 to a couple of Boost users and got very positive responses, along the “If it were there we would use it!” lines. That’s when I got the idea of writing this article in an attempt to get feedback from the broader C++ community rather than from just the hard-core Boost developers (only they can withstand the boost-dev mailing list traffic).

While the above discussion should give you a pretty good idea about the kind of buffer class I am talking about, below I am going to show a proposed interface and provide a complete, header-only implementation (released under the Boost license), in case you would like to give it a try.

class buffer
{
public:
  typedef std::size_t size_type;
  static const size_type npos = -1;
 
  ~buffer ();
 
  explicit buffer (size_type size = 0);
  buffer (size_type size, size_type capacity);
  buffer (const void* data, size_type size);
  buffer (const void* data, size_type size, size_type capacity);
  buffer (void* data, size_type size, size_type capacity,
          bool assume_ownership);
 
  buffer (const buffer&);
  buffer& operator= (const buffer&);
 
  void swap (buffer&);
  char* detach ();
 
  void assign (const void* data, size_type size);
  void assign (void* data, size_type size, size_type capacity,
               bool assume_ownership);
  void append (const buffer&);
  void append (const void* data, size_type size);
  void fill (char value = 0);
 
  size_type size () const;
  bool size (size_type);
  size_type capacity () const;
  bool capacity (size_type);
  bool empty () const;
  void clear ();
 
  char* data ();
  const char* data () const;
 
  char& operator[] (size_type);
  char operator[] (size_type) const;
  char& at (size_type);
  char at (size_type) const;
 
  size_type find (char, size_type pos = 0) const;
  size_type rfind (char, size_type pos = npos) const;
 
private:
  char* data_;
  size_type size_;
  size_type capacity_;
  bool free_;
};
 
bool operator== (const buffer&, const buffer&);
bool operator!= (const buffer&, const buffer&);

Most of the interface should be self-explanatory. The last overloaded constructor allows us to create a buffer by reusing an existing memory block. If the assume_ownership argument is true, then the buffer object will free the memory using delete[]. The detach() function is the mirror side of this functionality in that it allows us to detach the underlying memory block and reuse it in some other way. After the call to detach() the buffer object becomes empty and we should eventually free the returned memory using delete[]. The size() and capacity() modifiers return true to indicate that the underlying buffer address has changed, in case we cached it somewhere.

So, do you think we need something like this in Boost and perhaps in the C++ standard library? Do you like the proposed interface?