A Sense of Design » C++ Compilers

Archive for the ‘C++ Compilers’ Category

Do we need `std::buffer`?

Tuesday, August 9th, 2011

Or, boost::buffer for starters?

A few days ago I was again wishing that there was a standard memory buffer abstraction in C++. I have already had to invent my own classes for XSD and XSD/e (XML Schema to C++ compilers) where they are used for mapping the XML Schema hexBinary and base64Binary types to C++. Now I have the same problem in ODB (an ORM system for C++) where I need a suitable C++ type for representing database BLOB types. This time I have decided against creating another copy of my own buffer class and instead use the poor man’s “standard” buffer, std::vector<char>, with its unnatural interface and all.

The abstraction I am wishing for is a simple class for encapsulating the memory management of a raw memory buffer plus providing a few common operations, such as memcpy, memset, etc. So instead of writing this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_size_ (key_size)
  {
    key_data_ = new char[key_size];
    std::memcpy (key_data_, key_data, key_size);
  }
 
  ~person ()
  {
    delete key_data_;
  }
 
  ...
 
  char* key_data_;
  std::size_t key_size_;
};

Or having to create yet another custom buffer class, we could do this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_ (key_data, key_size)
  {
  }
 
  ...
 
  std::buffer key_;
};

Above I called vector<char> a poor man’s “standard” buffer. But what exactly is wrong with using it to manage a memory buffer? While it works reasonably well functionally, the interface is unnatural and some operations may not be as efficient as we would expect from a memory buffer. Let’s examine the most prominent examples of these issues.

The first problem is with how we access the underlying memory. The C++ standard defect report (DR) 464 added the data() member function to std::vector which returns a pointer to the buffer. However, there are still compilers in use that do not support this, notably GCC 3.4 and VC++ 2008/9.0. As a result, if you want your code to be portable, you will need to use the much less intuitive &b.front() expression:

vector<char> b = ...
memcpy (out, &b.front (), b.size ());

There is also a subtle issue with using front(). While it appears to be legal to call data() on an empty buffer (as long as we don’t dereference the returned pointer), it is illegal to call front(). This means that you may have to handle an empty buffer as a special case, further complicating your code:

vector<char> b = ...
memcpy (out, (b.empty () ? 0 : &b.front ()), b.size ());

The initialization of a buffer is also inconvenient and potentially inefficient. Let’s say we want to have an uninitialized buffer of 1024 bytes which we plan to fill in later. There is no way to do that with vector<char>. The best we can do is to have every byte initialized:

vector<char> b (1024); // Zero-initialized buffer.

If we want to create a buffer initialized with contents of a memory fragment, the interface we have to use is cumbersome:

vector<char> b (data, data + size);

What we want to write instead is this:

buffer b (data, size);

This initialization is also potentially inefficient. Depending on the quality of the implementation, std::vector may end up using a for loop instead of memcpy to copy the data. In fact, that’s exactly how it is done in GCC 4.5 and VC++ 2010/10.0 (Correction: as was pointed out in the comments, both GCC 4.5 and VC++ 10 optimize the case where the vector element is POD).

So I think it is quite clear that while vector<char> is workable, it is not particularly convenient or efficient.

Also, as it turns out this is not the first time I am playing with the idea of a dedicated buffer class in C++. A couple of months ago I started a thread on the Boost developer mailing list trying to see if there would be any interest in a simple buffer library in Boost. The result wasn’t very encouraging. The thread quickly splintered into discussions of various special-purpose, buffer-like data structures that people have in their applications.

On the other hand, I mentioned the buffer class at BoostCon 2011 to a couple of Boost users and got very positive responses, along the “If it were there we would use it!” lines. That’s when I got the idea of writing this article in an attempt to get feedback from the broader C++ community rather than from just the hard-core Boost developers (only they can withstand the boost-dev mailing list traffic).

While the above discussion should give you a pretty good idea about the kind of buffer class I am talking about, below I am going to show a proposed interface and provide a complete, header-only implementation (released under the Boost license), in case you would like to give it a try.

class buffer
{
public:
  typedef std::size_t size_type;
  static const size_type npos = -1;
 
  ~buffer ();
 
  explicit buffer (size_type size = 0);
  buffer (size_type size, size_type capacity);
  buffer (const void* data, size_type size);
  buffer (const void* data, size_type size, size_type capacity);
  buffer (void* data, size_type size, size_type capacity,
          bool assume_ownership);
 
  buffer (const buffer&);
  buffer& operator= (const buffer&);
 
  void swap (buffer&);
  char* detach ();
 
  void assign (const void* data, size_type size);
  void assign (void* data, size_type size, size_type capacity,
               bool assume_ownership);
  void append (const buffer&);
  void append (const void* data, size_type size);
  void fill (char value = 0);
 
  size_type size () const;
  bool size (size_type);
  size_type capacity () const;
  bool capacity (size_type);
  bool empty () const;
  void clear ();
 
  char* data ();
  const char* data () const;
 
  char& operator[] (size_type);
  char operator[] (size_type) const;
  char& at (size_type);
  char at (size_type) const;
 
  size_type find (char, size_type pos = 0) const;
  size_type rfind (char, size_type pos = npos) const;
 
private:
  char* data_;
  size_type size_;
  size_type capacity_;
  bool free_;
};
 
bool operator== (const buffer&, const buffer&);
bool operator!= (const buffer&, const buffer&);

Most of the interface should be self-explanatory. The last overloaded constructor allows us to create a buffer by reusing an existing memory block. If the assume_ownership argument is true, then the buffer object will free the memory using delete[]. The detach() function is the mirror side of this functionality in that it allows us to detach the underlying memory block and reuse it in some other way. After the call to detach() the buffer object becomes empty and we should eventually free the returned memory using delete[]. The size() and capacity() modifiers return true to indicate that the underlying buffer address has changed, in case we cached it somewhere.

So, do you think we need something like this in Boost and perhaps in the C++ standard library? Do you like the proposed interface?

Posted in VC++, Design, GCC g++, C++ | 17 Comments »

Smart pointers in Boost, TR1, and C++x0

Monday, May 24th, 2010

This post is an overview of the smart pointers available in Boost, TR1, and C++x0. It also touches on the availability and portability of the last two options when it comes to various C++ compilers.

General-purpose smart pointers can be divided into two categories: shared pointers and unique pointers. With shared pointers there could be multiple instances of the smart pointer pointing to the same object. Shared pointers normally use some form of reference counting to manage the lifetime of the object they point to. Unique pointers have the restriction of only one instance of the smart pointer managing the object.

Shared pointer implementations are normally differentiated by the location of the reference counter. The two most commonly used approaches are having the counter embedded into the object itself (intrusive reference counter) and allocating the counter separately, normally on the heap. Another, less frequently used approach, is to allocate the counter in the same block of memory as the object itself.

Unique pointer implementations are normally differentiated by the way they handle pointer copying and copy assignment. C++-98 std::auto_ptr is a unique pointer that transfers the ownership of the object from the source pointer to the newly created pointer in case of the copy construction or to the left hand side in case of the copy assignment.

Boost

Boost includes an assortment of smart pointers which are grouped into the header-only smart_ptr sub-library. The current release contains the following variants:

scoped_ptr	<boost/scoped_ptr.hpp>
intrusive_ptr	<boost/intrusive_ptr.hpp>
shared_ptr	<boost/shared_ptr.hpp>
weak_ptr	<boost/weak_ptr.hpp>

Boost scoped_ptr is a unique pointer implementation that does not support copying or copy assignment. Nor does it support returning a scoped_ptr instance from a function. intrusive_ptr provides support for objects with an embedded reference counter. This pointer calls the intrusive_ptr_add_ref(T*) and intrusive_ptr_release(T*) functions to manage the object’s lifetime. You are expected to provide suitable implementations of these functions for your object types.

Boost shared_ptr is a shared pointer implementation that uses a separate reference counter allocated on the heap. weak_ptr is a companion pointer which points to the object owned by shared_ptr without having an increment in the reference counter. It is primarily useful to resolve cycles in an object ownership graph that would otherwise prevent the objects in the graph from ever being deleted.

One common criticism of implementations with separate reference counters such as Boost shared_ptr is the performance and memory usage penalty incurred by the separate allocation of the reference counter. To mitigate this issue Boost shared_ptr provides two helper functions, make_shared() and allocate_shared(), that allow you to allocate the reference counter and the object itself as a single memory block. There are, however, other penalties and limitations associated with this approach.

Firstly, if you have a weak_ptr instance pointing to an object that has already been deleted (that is, there are no more shared_ptr instances pointing to this object), then that weak_ptr will prevent the memory that was used for the object from being freed. This is because the reference counter used by shared_ptr and weak_ptr is only freed when there are no more instances of either pointer type. And since the counter and the object are allocated as a single block of memory, they can only be freed together.

The other drawback of the make_shared() implementation is the increase in the object code size. Due to the way this optimization is implemented, an additional virtual table as well as a set of virtual functions will be instantiated for each object type that you use with make_shared().

Finally, make_shared() will need access to the object’s constructor. This, for example, breaks the canonical object factory implementation where the object’s constructor is made private to prevent direct construction and the factory is made a friend of the object’s class. Making make_shared() a friend is not easy either since it is actually a set of overloaded function templates.

To make use of Boost smart pointers in your application, you will need to add an external dependency on Boost. Since the smart_ptr library is header-only, you or users of your application won’t need to build anything in Boost. Boost is also fairly portable and can be used with most modern C++ compilers. The smart_ptr library in particular has been around for a while so even if all of Boost cannot be built with your compiler of choice, chances are you will be able to use the smart pointers.

TR1

Technical Report on C++ Library Extensions, commonly referred to as TR1, adds the shared_ptr smart pointer implementation to the std::tr1 namespace. The TR1 shared_ptr has the same interface as Boost shared_ptr. The only part that is not available in TR1 are the make_shared() and allocate_shared() functions discussed above.

If TR1 shared_ptr is the same as (or, more precisely, slightly “less” than) Boost shared_ptr, you may be wondering why would anyone use the TR1 version. You may prefer to use TR1 shared_ptr because its implementation comes with the C++ compiler and your application does not need to have any extra dependencies. However, if you are already using Boost, then it doesn’t make much sense to use shared_ptr from TR1. Another potential advantage of the TR1 version are the compiler and platform-specific optimizations that can be implemented by the compiler vendors. The thread safety of the reference counter operations is one area where such optimizations can make a big difference. However, in practice and at this time, most implementations of the TR1 shared_ptr are copies of the code from Boost.

The following table summarizes the support for TR1 shared_ptr in widely-used C++ compilers:

GNU g++	since 4.0.0
MS Visual Studio (VC++)	since 2008 (9.0) with Feature Pack or SP1
Sun Studio (Sun CC)	not available in the latest release (12 Update 1)
IBM XL C++	since 9.0
HP aCC	not available in the latest release (A.06.25)
Intel C++	uses TR1 headers from GNU g++ or VC++

The TR1 specification requires that if new declarations are added to existing headers (and shared_ptr is added to <memory>), such declarations should not be visible to the application code by default. Instead, the application developer must take some special action to enable TR1 declarations. With current implementations you are either required to define a special macro or include the TR1 versions of the headers from a different directory. This can be a major hurdle in writing portable applications that use TR1.

From the above list, GNU g++ uses the separate header approach and requires that you include headers with the tr1/ prefix in order to get the TR1 declarations. Visual Studio disregards the TR1 specifications and enables TR1 by default for all applications. IBM XL C++ requires you to define the __IBMCPP_TR1__ macro. And Intel C++, since it uses the C++ standard library from GNU g++ on Linux/Mac OS X and from Visual Studio on Windows, will behave like one of the two compilers, depending on the platform. The following code fragment shows how we can include the TR1-enabled <memory> header in a portable manner:

#include <cstddef> // for __GLIBCXX__
 
#ifdef __GLIBCXX__
#  include <tr1/memory>
#else
#  ifdef __IBMCPP__
#    define __IBMCPP_TR1__
#  endif
#  include <memory>
#endif

Boost also provides an implementation of TR1 (since version 1.34.0) which is just a thin wrapper around other boost libraries. So if the compiler version that you are using does not yet support TR1, you can fall back on the TR1 implementation from Boost. Note, however, that there are some compiler-specific issues that you may have to resolve if you want to include the TR1 headers using their standard names, for example <memory>. On the other hand, using Boost-specific headers, for example <boost/tr1/memory.hpp>, should work consistently across different compilers. See the Boost TR1 library documentation for details. The following code shows how to include the TR1-enabled <memory> header if the compiler provides one and how to fall back on the boost implementation otherwise:

#include <cstddef> // __GLIBCXX__, _HAS_TR1
 
// GNU C++ or Intel C++ using libstd++.
//
#if defined (__GNUC__) && __GNUC__ >= 4 && 
  defined (__GLIBCXX__)
#  include <tr1/memory>
//
// IBM XL C++.
//
#elif defined (__xlC__) && __xlC__ >= 0×0900
#  define __IBMCPP_TR1__
#  include <memory>
//
// VC++ or Intel C++ using VC++ standard library.
//
#elif defined (_MSC_VER) && (_MSC_VER == 1500 && 
  defined (_HAS_TR1) || _MSC_VER > 1500)
#  include <memory>
//
// Boost fall-back.
//
#else
#  include <boost/tr1/memory.hpp>
#endif

C++-0x

C++-0x moves the std::tr1::shared_ptr smart pointer to the std namespace and adds support for make_shared() and allocate_shared().

C++-0x also deprecates auto_ptr and adds a new unique pointer implementation called unique_ptr. The new implementation disables the copy constructor and copy assignment operator and instead provides the “move” constructor and assignment operator that use rvalue-references as their arguments. This still allows you to return a unique_ptr instance from a function with the ownership of the pointed-to object being automatically transferred from the function body to the caller. However, if you want to transfer the ownership from one instance of unique_ptr to another, you will have to do it explicitly with the std::move() call, for example:

struct s {};
std::unique_ptr<s> a (new s);
std::unique_ptr<s> b (std::move (a));
a = std::move (b);

At this point only a few C++ compilers support C++-0x and this support is incomplete and experimental. Currently only GCC g++ (4.3 or later), VC++ (10.0) and Intel C++ (11.0) provide enough C++-0x language support to be able to implement shared_ptr and unique_ptr as specified in the draft of the standard.

So which smart pointer implementation should you use in your application? If you have the luxury of using C++-0x, then the choice is pretty straightforward: use std::shared_ptr for shared pointers and std::unique_ptr for unique pointers. The rvalue-aware implementations of these pointers are too good to ignore. For the rest of us who cannot yet use C++-0x, the unique pointer is the old faithful std::auto_ptr and the choice for a shared pointer is between using the compiler-provided one from TR1 or the Boost implementation. If your application is already using Boost, then the choice seems pretty straightforward as well: use Boost and forget about different compiler versions, etc. On the other hand, if your goal is to minimize the external library dependencies, it may be worthwhile to try to use the native TR1 implementation on modern (and thus more popular) C++ compilers and fall back on Boost when TR1 is not available.

You may also find the following articles relevant to this topic:

Posted in C++ Compilers, C++ | Comments Off

Parsing C++ with GCC plugins, Part 3

Monday, May 17th, 2010

This is the third installment in the series of posts about parsing C++ with GCC plugins. In the previous post we covered the basics of the GCC AST (abstract syntax tree) as well as learned how to traverse all the declarations in the translation unit. This post is dedicated to types. In particular, we will learn how to access various parts of the class definition, such as its bases, member variables, member functions, nested type declarations, etc. At the end we will have a working plugin that prints all this information for every class defined in the translation unit.

All type nodes in the GCC AST have tree codes that end with _TYPE. To get a type node from a declaration node we use the TREE_TYPE macro. If a declaration has no type, such as NAMESPACE_DECL, then this macro returns NULL. Here is how we can improve the print_decl() function from the previous post to also print the declaration’s type’s tree code:

void
print_decl (tree decl)
{
  int tc (TREE_CODE (decl));
  tree id (DECL_NAME (decl));
  const char* name (id
                    ? IDENTIFIER_POINTER (id)
                    : "<unnamed>");
 
  cerr << tree_code_name[tc] << " " << name;
 
  if (tree t = TREE_TYPE (decl))
    cerr << " type " << tree_code_name[TREE_CODE (t)];
 
  cerr << " at " << DECL_SOURCE_FILE (decl)
       << ":" << DECL_SOURCE_LINE (decl) << endl;
}

If we now run the modified plugin on the following C++ code fragment:

class c {};
typedef const c* p;
int i;

We will get the following output:

type_decl c type record_type at test.cxx:1
type_decl p type pointer_type at test.cxx:2
var_decl i type integer_type at test.cxx:3

The most commonly seen AST types can be divided into three categories:

Fundamental Types

VOID_TYPE
REAL_TYPE
BOOLEAN_TYPE
INTEGER_TYPE

Derived Types

POINTER_TYPE
REFERENCE_TYPE
ARRAY_TYPE

User-Defined Types

RECORD_TYPE
UNION_TYPE
ENUMERAL_TYPE

Some node types, such as REAL_TYPE and INTEGER_TYPE, cover several fundamental types. In this case the AST has a separate node instance for each specific fundamental type. For example, the integer_type_node is a global variable that holds a pointer to the INTEGER_TYPE node corresponding to the int type. For the derived types (here the term derived type means pointer, reference, or array type rather than C++ class inheritance), the TREE_TYPE macro returns the pointed-to, referenced, or element type, respectively. The RECORD_TYPE nodes represent struct and class types.

You might also expect that GCC has a separate node kind to represent const/volitile/restrict-qualified (cvr-qualified) types. This is not the case. Instead, each type node contains a cvr-qualifier. So when the source code defines a const variant of some type, GCC creates a copy of the original type node and sets the const-qualifier on the copy to true. To check whether a type has one of the qualifiers set, you can use the CP_TYPE_CONST_P, CP_TYPE_VOLATILE_P, and CP_TYPE_RESTRICT_P macros.

The above design decision has one important implication: the AST can contain multiple type nodes for the same C++ type. In fact, according to the GCC documentation, the copies may not even have different cvr-qualifiers. In other words, the AST can use two identical nodes to represent the same type for no apparent reason. As a result, you shouldn’t use tree node pointer comparison to decide whether you are dealing with the same type. Instead, the GCC documentation recommends that you use the same_type_p predicate.

One macro that is especially useful in dealing with the multiple nodes situation is TYPE_MAIN_VARIANT. This macro returns the primary, cvr-unqualified type from which all the cvr-qualified and other copies have been made. In particular, this macro allows you to use the type node pointer in a set or as a map key, which is not possible with same_type_p.

Let’s now concentrate on the RECORD_TYPE nodes which represent the class types. The first thing that you will probably want to do once you are handed a class node is to find its name. Well, that’s actually a fairly tricky task in the GCC AST. In fact, I would say it is the most convoluted area, outdone, maybe, only by the parts of the AST dealing with C++ templates. Let’s try to unravel this from the other side, notably the type declaration side.

In the GCC AST types don’t have names. Instead, types are declared to have names using type declarations (TYPE_DECL tree node). This may seem unnatural to you since in C++ user-defined types do have names, for example:

class c {};

While that’s true, the AST treats the above declaration as if it was declared like this:

typedef class {} c;

The problem with this approach is how to distinguish the following two cases:

class c {}; // AST: typedef class {} c;
typedef c t;

To distinguish such cases the TYPE_DECL nodes that are “imagined” by the compiler are marked as artificial which can be tested with the DECL_ARTIFICIAL macro. Let’s add the print_class() function and modify print_decl() to test this out:

void
print_class (tree type)
{
  cerr << "class ???" << endl;
}
 
void
print_decl (tree decl)
{
  tree type (TREE_TYPE (decl));
  int dc (TREE_CODE (decl));
  int tc;
 
  if (type)
  {
    tc = TREE_CODE (type);
 
    if (dc == TYPE_DECL && tc == RECORD_TYPE)
    {
      // If DECL_ARTIFICIAL is true this is a class
      // declaration. Otherwise this is a typedef.
      //
      if (DECL_ARTIFICIAL (decl))
      {
        print_class (type);
        return;
      }
    }
  }
 
  tree id (DECL_NAME (decl));
  const char* name (id
                    ? IDENTIFIER_POINTER (id)
                    : "<unnamed>");
 
  cerr << tree_code_name[dc] << " "
       << decl_namespace (decl) << "::" << name;
 
  if (type)
    cerr << " type " << tree_code_name[tc];
 
  cerr << " at " << DECL_SOURCE_FILE (decl)
       << ":" << DECL_SOURCE_LINE (decl) << endl;
}

If we now run this modified version of our plugin on the above two declarations, we will get:

class ???
type_decl t type record_type at test.cxx:3

Ok, so this works as expected. Now how can we get the name of the class from the RECORD_TYPE node? In the above code we could have passed the declaration node along with the type node to the print_class() function. But that’s not very elegant and is not always possible, as we will see in a moment. Instead, we can use the TYPE_NAME macro to get to the type’s declaration. There are a couple of caveats, however. First, remember that the same type can have multiple tree nodes in the AST. You can also get different declarations for different type nodes denoting the same type. Then the same type node can be declared with multiple declarations. For example, there could be multiple typedef names for the same type. So which declaration are we going to get? There is no simple answer to this question. However, if you get the primary type with TYPE_MAIN_VARIANT and then get its declaration with TYPE_NAME and if the type was named in the source code, then this will be the artificial declaration that we talked about before. Here is the new implementation of print_class() that uses this technique:

void
print_class (tree type)
{
  type = TYPE_MAIN_VARIANT (type);
 
  tree decl (TYPE_NAME (type));
  tree id (DECL_NAME (decl));
  const char* name (IDENTIFIER_POINTER (id));
 
  cerr << "class " << name << " at "
       << DECL_SOURCE_FILE (decl) << ":"
       << DECL_SOURCE_LINE (decl) << endl;
}

Running this version of the plugin on the above code fragment produces the expected output:

class c at test.cxx:1
type_decl t type record_type at test.cxx:2

Let’s now print some more information about the class. Things that we may be interested in include base classes, member variables, member functions, and nested type declarations. We will start with the list of base classes. The base classes of a particular class are represented as a vector of BINFO tree nodes and can be obtained with the TYPE_BINFO macro. To get the number of elements in this vector we use the BINFO_N_BASE_BINFOS macro. To get the Nth element we use the BINFO_BASE_BINFO macro. The macros that we can use on the BINFO node include BINFO_VIRTUAL_P which returns true if the base is virtual and BINFO_TYPE which returns the tree node for the base type itself. Naturally, you may also expect that there is a macro named something like BINFO_ACCESS which return the access specifier (public, protected, or private) for the base. If so, then you haven’t really gotten the spirit of the GCC AST design yet: if something would feel simple and intuitive, then find a way to make it convoluted and surprising. So, no, there is no macro to get the base access specifier. In fact, this information is not even stored in the BINFO node. Rather, it is stored in a vector that runs parallel to the BINFO nodes. The Nth element in this vector can be accessed with the BINFO_BASE_ACCESS macro. The following code fragment shows how to put all this information together:

enum access_spec
{
  public_, protected_, private_
};
 
const char* access_spec_str[] =
{
  "public", "protected", "private"
};
 
void
print_class (tree type)
{
  type = TYPE_MAIN_VARIANT (type);
 
  ...
 
  // Traverse base information.
  //
  tree biv (TYPE_BINFO (type));
  size_t n (biv ? BINFO_N_BASE_BINFOS (biv) : 0);
 
  for (size_t i (0); i < n; i++)
  {
    tree bi (BINFO_BASE_BINFO (biv, i));
 
    // Get access specifier.
    //
    access_spec a (public_);
 
    if (BINFO_BASE_ACCESSES (biv))
    {
      tree ac (BINFO_BASE_ACCESS (biv, i));
 
      if (ac == 0 || ac == access_public_node)
        a = public_;
      else if (ac == access_protected_node)
        a = protected_;
      else
        a = private_;
    }
 
    bool virt (BINFO_VIRTUAL_P (bi));
    tree b_type (TYPE_MAIN_VARIANT (BINFO_TYPE (bi)));
    tree b_decl (TYPE_NAME (b_type));
    tree b_id (DECL_NAME (b_decl));
    const char* b_name (IDENTIFIER_POINTER (b_id));
 
    cerr << "t" << access_spec_str[a]
         << (virt ? " virtual" : "")
         << " base " << b_name << endl;
  }
}

The list of member variable and nested type declarations can be obtained with the TYPE_FIELDS macro. It is a chain of *_DECL nodes, similar to namespaces. The declarations that can appear on this list include FIELD_DECL (non-static member variable declaration), VAR_DECL (static member variables), and TYPE_DECL (nested type declarations).

The list of member functions can be obtained with the TYPE_METHODS macro and can only contain the FUNCTION_DECL nodes. To determine if a function is static, use the DECL_STATIC_FUNCTION_P predicate. Other useful member function predicates include: DECL_CONSTRUCTOR_P, DECL_COPY_CONSTRUCTOR_P, and DECL_DESTRUCTOR_P.

To determine the access specifier for a member declaration you can use the TREE_PRIVATE and TREE_PROTECTED macros (note that TREE_PUBLIC appears to be used for a different purpose).

As with namespaces, the order of declarations on these lists is not preserved so if we want to traverse them in the source code order, we will need to employ the same technique as we used for traversing namespaces. The following code fragment shows how we can print some information about class members:

void
print_class (tree type)
{
  type = TYPE_MAIN_VARIANT (type);
 
  ...
 
  // Traverse members.
  //
  decl_set set;
 
  for (tree d (TYPE_FIELDS (type));
       d != 0;
       d = TREE_CHAIN (d))
  {
    switch (TREE_CODE (d))
    {
    case TYPE_DECL:
      {
        if (!DECL_SELF_REFERENCE_P (d))
          set.insert (d);
        break;
      }
    case FIELD_DECL:
      {
        if (!DECL_ARTIFICIAL (d))
          set.insert (d);
        break;
      }
    default:
      {
        set.insert (d);
        break;
      }
    }
  }
 
  for (tree d (TYPE_METHODS (type));
       d != 0;
       d = TREE_CHAIN (d))
  {
    if (!DECL_ARTIFICIAL (d))
      set.insert (d);
  }
 
  for (decl_set::iterator i (set.begin ()), e (set.end ());
       i != e; ++i)
  {
    print_decl (*i);
  }
}

We can now try to run all this code on a C++ class that has some bases and members, for example:

class b1 {};
class b2 {};
class c: protected b1,
         public virtual b2
{
  int i;
  static int s;
  void f ();
  c (int);
  ~c ();
  typedef int t;
  class n {};
};

And below is the output from our plugin. Here we use the version that prints fully-qualified names for declarations:

class ::b1 at test.cxx:1
class ::b2 at test.cxx:2
var_decl ::_ZTI1c type record_type at test.cxx:5
class ::c at test.cxx:5
        protected base ::b1
        public virtual base ::b2
field_decl ::c::i type integer_type at test.cxx:6
var_decl ::c::s type integer_type at test.cxx:7
function_decl ::c::f type method_type at test.cxx:8
function_decl ::c::c type method_type at test.cxx:9
function_decl ::c::__base_ctor  type method_type at test.cxx:9
function_decl ::c::__comp_ctor  type method_type at test.cxx:9
function_decl ::c::c type method_type at test.cxx:10
function_decl ::c::__base_dtor  type method_type at test.cxx:10
function_decl ::c::__comp_dtor  type method_type at test.cxx:10
type_decl ::c::t type integer_type at test.cxx:11
class ::c::n at test.cxx:12

Figuring out what the _ZTI1c, __base_ctor, __comp_ctor, __base_dtor, and __comp_dtor declarations are is left as an exercise for the reader.

And that’s it for today as well as for the series. There is a number of GCC AST areas, such as C++ templates, functions declarations, function bodied, #include information, custom #pragma’s and attributes, etc., that haven’t been covered. However, I believe the GCC plugin and AST basics that were discussed in this and the two previous posts should be sufficient to get you started should you need to parse some C++.

If you have any questions, comments, or know the answer to the exercise above, you are welcome to leave them below. The complete source code for the plugin we have developed in this post is available as the plugin-3.cxx file.

Posted in GCC g++, C++ | 4 Comments »

Archive for the ‘C++ Compilers’ Category

Do we need `std::buffer`?

Smart pointers in Boost, TR1, and C++x0

Boost

TR1

C++-0x

Parsing C++ with GCC plugins, Part 3

Pages

Archives

Categories

Archive for the ‘C++ Compilers’ Category

Do we need std::buffer?

Smart pointers in Boost, TR1, and C++x0

Boost

TR1

C++-0x

Parsing C++ with GCC plugins, Part 3

Pages

Archives

Categories

Do we need `std::buffer`?