Archive for April, 2009

CodeSynthesis XSD/e 3.0.x released

Monday, April 20th, 2009

XSD/e 3.1.0 was released a couple of days ago. In fact, we released 3.0.0 about two months ago but I haven’t talked much about it. This is because after the 3.0.0 release we got quite a bit of very positive feedback along with requests for additional, more advanced features that we promised to add but haven’t yet implemented. So we decided to do another quick iteration and release 3.1.0. In this post I will highlight what’s new in both XSD/e 3.0.0 and 3.1.0 (official announcements: XSD/e 3.0.0 and XSD/e 3.1.0).

Prior to the 3.0.0 release, XSD/e only supported the event-driven XML parsing/serialization mode where you had to process/supply data as the document was being parsed/serialized. While this mode is particularly suitable for mobile and embedded systems due to low memory consumption, many users asked for an easier to use in-memory, tree-like representation of data stored in XML. As a result, XSD/e 3.0.0 shipped with a new XML Schema to C++ mapping: C++/Hybrid.

There were a number of challenges that we had to overcome before introducing such a mapping into XSD/e. Unlike the general-purpose platforms, embedded systems are often severely constrained by the amount of memory available to the application. In fact, for single-purpose, massively-produced devices such as network modems the goal is to use as little RAM as possible since every megabyte not present in the device translates into huge savings for the manufacturer.

Thus, the first goal of the new mapping was to provide an in-memory representation of XML data using the least amount of RAM possible. For example, we couldn’t adopt the approach used in C++/Tree, our general-purpose in-memory mapping, where each node in the object model is allocated dynamically, because it wastes too much memory in extra pointers, heap management data, etc. At the same time we couldn’t allocate everything statically either since the copying involved in passing by value may be too expensive for some objects. As a result, the C++/Hybrid mapping divides all types into two categories: fixed-length and variable-length (if you are familiar with the IDL to C++ mapping in CORBA, you probably recognize the concept). Fixed-length types are allocated statically and returned by value while variable-length types are allocated dynamically and returned as pointers. This approach minimizes the memory usage while avoiding expensive copying. Consider the following schema fragment as an example:

<complexType name="point_t">
    <element name="x" type="float"/>
    <element name="y" type="float"/>
    <element name="z" type="float"/>
<complexType name="series_t">
    <element name="value" type="int" maxOccurs="unbounded"/>
<complexType name="measure_t">
    <element name="point" type="point_t"/>
    <element name="series" type="series_t"/>

The corresponding C++/Hybrid object model is shown below:

  // point_t (fixed-length)
  class point_t
    float x () const;
    float& x ();
    void x (float);
    float y () const;
    float& y ();
    void y (float);
    float z () const;
    float& z ();
    void z (float);
    float x_;
    float y_;
    float z_;
  // series_t (variable-length)
  class series_t
    typedef pod_sequence<int> value_sequence;
    typedef value_sequence::iterator value_iterator;
    typedef value_sequence::const_iterator value_const_iterator;
    const value_sequence& value () const;
    value_sequence& value ();
    value_sequence value_;
  // measure_t (variable-length)
  class measure_t
    const point_t& point () const;
    point_t& point ();
    void point (const point_t&);
    const series_t& series () const;
    series_t& series ();
    void series (series_t*);
    point_t point_;
    series_t* series_;

In the above example the point_t class is fixed-length and contained by value in the measure_t class. In contrast, series_t contains a sequence of ints which makes it variable-length (and expensive to copy). Instances of this class are dynamically allocated and stored as pointers in measure_t.

But even with the optimal memory usage an in-memory mapping may not be usable in an embedded environment for all but very small XML documents. A 100Kb document is trivial by today’s desktop or server standards. But loading such a document all at once into the memory on an embedded system may be prohibitively expensive. So we have the harder to use, especially for larger XML vocabularies, event-driven mode that uses very little RAM. And we have the more convenient, in-memory mode that for all but fairly small documents requires too much memory. In C++/Hybrid we solved this by supporting a hybrid (thus the mapping name) partially in-memory, partially event-driven mode. In this mode your application is supplied (in case of parsing) or it supplies (in case of serialization) the XML document in fragments represented as in-memory object models. The following example will help illustrate how this works. Let’s extend the schema presented above with the data_t type:

<complexType name="data_t">
    <element name="measure" type="measure_t" 
<element name="data" type="data_t"/>

The corresponding XML document might look like this:


Let’s assume the XML document above contains a couple of thousand measure records which makes it too large to load into memory all at once. With C++/Hybrid you can setup parsing/serialization so that your application receives/supplies each measure one by one as an instance of the measure_t class. The depth in the XML document at which point you “switch” from event-driven to in-memory processing is arbitrary and is not limited to the top level. For example, if instead of having thousands of measure records we only had a few but each containing hundreds of thousands of value records, we could have setup parsing/serialization in such a way that the application receives/supplies the point data as an instance of point_t and then each value one by one as float.

So that was XSD/e 3.0.0. After its release a number of people started using the new mapping and providing us with feedback. It became apparent that a couple of more advanced features that we left out from the initial C++/Hybrid release were needed. These were added in XSD/e 3.1.0 with the major two being the support for XML Schema polymorphism and binary serialization.

Support for polymorphism allows C++/Hybrid to handle XML vocabularies that use substitution groups and/or xsi:type dynamic typing. To minimize the generated code size we used a new approach where only certain type hierarchies (automatically detected in case of substitution groups and indicated by the user in case of xsi:type) are treated as polymorphic.

Binary serialization provides an extensible, high-performance mechanism for saving the object model to and loading it from compact binary formats for storage or over-the-wire transfer. Binary representations contain only the data without any meta information or markup. Consequently, saving to and loading from a binary format can be an order of magnitude faster as well as result in a much smaller application footprint compared to parsing and serializing the same data in XML. Plus, the resulting representation is normally several times smaller than the equivalent XML.

Built-in support is provided for XDR (via Sun RPC API) and CDR (via the ACE library) and custom formats can be easily added. XDR appears to be a particularly good choice for a portable format since it is part of the operating systems on most commonly-used embedded platforms (for example, Linux, VxWorks, QNX, LynxOS, IPhone OS).

One common use-case for binary serialization is an embedded system that needs to consume and/or supply data in XML format but cannot afford to include an XML parser and/or serializer due to performance or footprint constraints. The requirement to use XML may come from the use of existing or third party desktop/server applications on the other end or from the use of industry-standard, XML-based formats. In this situation a control or gateway application running in a non-embedded environment translates the XML data sent to the embedded systems to a binary representation and then translates the binary representation received from the embedded systems back to XML.

There is a number of other interesting features in the C++/Hybrid mapping that I didn’t cover in this post, including:

  • Precise reproduction of the XML vocabulary structure and element order
  • Filtering of XML data during parsing and object model during serialization
  • Customizable object model classes as well as parsing and serialization code

If you would like more information on these and other features, the C++/Hybrid Mapping page is a good starting point.

C++ data alignment and portability

Monday, April 6th, 2009

The upcoming version of XSD/e adds support for serializing the object model to a number of binary data representation formats, such as XDR and CDR. It also supports custom binary formats. One person was beta-testing this functionality with the goal of achieving the fastest serialization/deserialization possible. He was willing to sacrifice the wider format portability across platforms as long as it was interoperable between iPhone OS and Mac OS X.

Since both iPhone OS on ARM and Mac OS X on x86 are little-endian and have compatible fundamental type sizes (e.g., int, long, double, etc., except for long double which is not used in XSD/e), the natural first optimization was to make the custom format’s endianess and type sizes to be those of the target platforms. This allowed optimizations such as reading/writing sequences of fundamental types with a memcpy() call instead of a for loop. After achieving this improvements he then suggested what would seem as a natural next optimization. If we can handle fundamental types with memcpy(), why can’t we do the same for simple classes that don’t have any pointer members (fixed-length types in the XSD/e object model terms)? When designing a “raw” binary format like this, most people are aware of the type size and endianess compatibility issues. But there is another issue that we need to be aware of if we try to do this kind of optimizations: data alignment compatibility.

First, a quick introduction to the data alignment and C++ data structure padding. For a more detailed treatment of this subject, see, for example, Data alignment: Straighten up and fly right. Modern CPUs are capable of reading data from memory in chunks, for example, 2, 4, 8, or 16 bytes at a time. But due to the memory organization, the addresses of these chunks should be multiples of their sizes. If an address satisfies this requirement, then it is said to be properly aligned. The consequences of accessing data via an unaligned address can range from slower execution to program termination, depending on the CPU architecture and operating system.

Now let’s move one level up to C++. The language provides a set of fundamental types of various sizes. To make manipulating variables of these types fast, the generated object code will try to use CPU instructions which read/write the whole data type at once. This in turn means that the variables of these types should be placed in memory in a way that makes their addresses suitably aligned. As a result, besides size, each fundamental type has another property: its alignment requirement. It may seem that the fundamental type’s alignment is the same as its size. This is not generally the case since the most suitable CPU instruction for a particular type may only be able to access a part of its data at a time. For example, a CPU may only be able to read at most 4 bytes at a time so a 64-bit long long type will have a size of 8 and an alignment of 4.

GNU g++ has a language extension that allows you to query a type’s alignment. The following program prints fundamental type sizes and alignment requirements of a platform for which it was compiled:

#include <iostream>
using namespace std;
template <typename T>
void print (char const* name)
  cerr << name
       << " sizeof = " << sizeof (T)
       << " alignof = " << __alignof__ (T)
       << endl;
int main ()
  print<bool>        ("bool          ");
  print<wchar_t>     ("wchar_t       ");
  print<short>       ("short int     ");
  print<int>         ("int           ");
  print<long>        ("long int      ");
  print<long long>   ("long long int ");
  print<float>       ("float         ");
  print<double>      ("double        ");
  print<long double> ("long double   ");
  print<void*>       ("void*         ");

The following listing shows the result of running this program on a 32-bit x86 GNU/Linux machine. Notice the size and alignment of the long long, double, and long double types.

bool           sizeof = 1  alignof = 1
wchar_t        sizeof = 4  alignof = 4
short int      sizeof = 2  alignof = 2
int            sizeof = 4  alignof = 4
long int       sizeof = 4  alignof = 4
long long int  sizeof = 8  alignof = 4
float          sizeof = 4  alignof = 4
double         sizeof = 8  alignof = 4
long double    sizeof = 12 alignof = 4
void*          sizeof = 4  alignof = 4

[Actually, the above program shows that the alignment of long long and double is 8. This is, however, not the case since the IA32 ABI specifies that their alignments should be 4. Also, if you wrap long long or double in a struct and take the alignment of the resulting type, it will be 4, not 8.]

And the following listing is for 64-bit x86-64 GNU/Linux:

bool           sizeof = 1  alignof = 1
wchar_t        sizeof = 4  alignof = 4
short int      sizeof = 2  alignof = 2
int            sizeof = 4  alignof = 4
long int       sizeof = 8  alignof = 8
long long int  sizeof = 8  alignof = 8
float          sizeof = 4  alignof = 4
double         sizeof = 8  alignof = 8
long double    sizeof = 16 alignof = 16
void*          sizeof = 8  alignof = 8

The C++ compiler also needs to make sure that member variables in a struct or class are properly aligned. For this, the compiler may insert padding bytes between member variables. Additionally, to make sure that each element in an array of a user-defined type is aligned, the compiler may add some extra padding after the last data member. Consider the following struct as an example:

struct foo
  bool a;
  short b;
  long long c;
  bool d;

The compiler always assumes that an instance of foo will start at an address aligned to the most strict alignment requirement of all of foo’s members, which is long long in our case. This is actually how the alignment requirements of a user-defined types are calculated. Assuming we are on x86-64 with short having the alignment of 2 and long long — of 8, to make the b member suitably aligned, the compiler needs to insert an extra byte between a and b. Similarly, to align c, the compiler needs to insert four bytes after b. Finally, to make sure the next element in an array of foos starts at an address aligned to 8, the compiler needs to add seven bytes of padding at the end of struct foo. Here is the actual memory image of this struct with the positions of each member when the object is allocated at an example address 8:

                 // addr  alignment
struct foo       // 8     8
  bool a;        // 8     1
  char pad1[1];
  short b;       // 10    2
  char pad2[4]
  long long c;   // 16    8
  bool d;        // 24    1
  char pad3[7];
};               // 32    8  (next element in array)

Now back to our question about serializing simple classes with memcpy(). It should be clear by now that to be able to save a user-defined type with memcpy() on one platform and then load it on another, the two platforms not only need to have fundamental types of the same sizes and be of the same endianess, but they also need to be alignment-compatible. Otherwise, the positions of members inside the type and even the size of the type itself can differ. And this is exactly what happens if we try to move the data corresponding to foo between x86 and x86-64 even though the types used in the struct are of the same size. Here is what the padded memory image of foo on x86 looks like:

struct foo
  bool a;
  char pad1[1];
  short b;
  long long c;
  bool d;
  char pad2[3];

Because the alignment of long long on this platform is 4, padding between b and c is no longer necessary and padding at the end of the struct is 3 bytes instead of 7. The size of this struct is 16 bytes on x86 and 24 bytes on x86-64.

[For those curious about Mac OS X on x86 and iPhone OS on ARM, they are alignment-compatible, as long as you don’t use long double which has different sizes on the two platforms.]