CodeSynthesis XSD 4.0.0 Released

July 22nd, 2014

CodeSynthesis XSD 4.0.0 was released today.

In case you are not familiar with XSD, it is an open-source, cross-platform XML Schema to C++ data binding compiler. Provided with an XML instance specification (XML Schema), it generates C++ classes that represent the given vocabulary as well as XML parsing and serialization code. You can then access the data stored in XML using types and functions that semantically correspond to your application domain rather than dealing with the intricacies of reading and writing raw XML.

For an exhaustive list of new features see the official announcement. Below I am going to cover the notable new features in more detail and include some insight into what motivated their addition.

Ok, that was a major new release. So what are the major changes and new features? Well, firstly, we removed quite a bit of “outdated backwards-compatibility baggage”, such as support for Xerces-C++ 2-series (2.8.0) or Visual Studio 2003 (7.1). At the same time, the good news is there aren’t any changes that will break existing code. What has changed a lot are the compiler internals, and, especially, dependencies which will make building XSD from source much easier.

While removing old stuff we also added support for new C++ compilers that popped up since the last release. XSD now supports Clang as well as Visual Studio 2012 (11.0) and 2013 (12.0).

Ok, let’s now examine the major new features. The biggest is support for C++11 (the --std c++11 option). While there are many little changes in the generated code when this mode is enabled, the major two are the reliance on the move semantics and the use of std::unique_ptr instead of deprecated std::auto_ptr.

Another big feature in this release is support for ordered types. XSD flattens nested XML Schema compositors to give us a nice and simple C++ API. This works very well in most cases, especially for more complex schemas. Sometimes, however, this can lead to the loss of relative element ordering that can be semantically important to the application (the “unordered choice” XML Schema idiom). Now you can mark such types as ordered which makes XSD generate an additional order tracking API. So now you can have the best of both worlds: nice and simple API in most cases and additional order information in a few places where the simple API is not enough.

Once we had this implemented, another stubbornly annoying feature, mixed content, got sorted out. The problem with mixed content is that the text fragments can appear interleaved with elements in pretty much any order. Extracting the text is easy, it is preserving the order information relative to the elements, that’s the tricky part. But now we have the perfect mechanism for that. One user who was beta-testing this feature said: “I read the new documentation and I’m impressed.”

You can read more on ordered types in Section 2.8.4, “Element Order” and on mixed content in Section 2.13, “Mapping for Mixed Content Models” in the C++/Tree Mapping User Manual.

Another problem that is somewhat similar to mixed content is access to data represented by xs:anyType and xs:anySimpleType XML Schema types. anyType allows any content in any order. You can think of its definition as a complex type with mixed content that has an element wildcard that allows any elements and an attribute wildcard that allows any attributes. In other words, anything goes. XSD already can represent wildcard content as raw DOM fragments so it was only natural to extend this support to anyType content. Similar to anyType, anySimpleType allows any simple content, that is, any text (pretty similar to xs:string in that sense). Now it is possible to get anySimpleType content as a text string.

For more information on this new feature see Section 2.5.2, “Mapping for anyType” and Section 2.5.3, “Mapping for anySimpleType” in the C++/Tree Mapping User Manual.

Another cool feature in XSD is the stream-oriented, partially in-memory XML processing that allows parsing and serialization of XML documents in chunks. This allows us to process parts of the document as they become available as well as handle documents that are too large to fit into memory. XSD comes with an example, called streaming, that shows how to set all this up. In this release this example has been significantly improved. It now has much better XML namespace handling and allows streaming at multiple document levels. This turned out to be really useful for handling large and complex documents such as GML/CityGML.

Last but not least, those of us who still prefer to write our own makefiles will be happy to know XSD now supports automatic make-style dependency generation, similar to the GCC’s -M* functionality but just with sane option names. See the XSD Compiler Command Line Manual (man pages) for details.

libstudxml – modern XML API for C++

May 20th, 2014

My talk at this year’s C++Now was about an XML API for modern C++. An API that I believe should have already been in Boost or even in the C++ standard library. Presenting an API without an implementation would be rather lame, so during my talk I also announced libstudxml, which is an open source (MIT) compact, external dependency-free, and reasonably efficient XML library for modern, standard C++. In other words, a library that you can use in pretty much any project and on any platform without much fuss.

A piece of code is worth a thousand words, so let me give you a taste of the API. For this XML:

<person id="123">
  <name>John Doe</name>
  <age>23</age>
  <gender>male</gender>
</person>

The parsing code could look like this:

enum class gender {...};
 
ifstream ifs (argv[1]);
parser p (ifs, argv[1]);
 
p.next_expect (parser::start_element, "person", content::complex);
 
long id = p.attribute<long> ("id");
 
string n = p.element ("name");
short a = p.element<short> ("age");
gender g = p.element<gender> ("gender");
 
p.next_expect (parser::end_element); // person

And that’s with all the validation necessary for this XML vocabulary. But I don’t see any exceptions being thrown, you might say. And that’s exactly the point. Here is the list of interesting features this API has:

  • Streaming pull parser and streaming serializer
  • Two-level API: minimum overhead low-level & more convenient high-level
  • Content model-aware (empty, simple, complex, mixed)
  • Whitespace processing based on content model
  • Validation based on content model
  • Validation of missing/extra attributes
  • Validation of unexpected events (elements, etc)
  • Data extraction to value types
  • Attribute map with extended lifetime (high-level API)

The XML parser in libstudxml is a conforming, non-validating XML 1.0 implementation that is based on tested and proven code (see Implementation Notes for details). A lot of people ask me why not use one of the new, claimed to be super fast and/or compact XML libraries for C++ that are already out there (RapidXML, PugiXML, TinyXML, etc)? The main reason is that they are not real, as in conforming, XML parsers. I discuss why you should stick to real XML parsers in my talk. Hopefully the videos will be posted soon.

Interested? For more information on the API you can jump directly to the Introduction which shows a lot of examples. Or you can grab and build the source code distribution from the libstudxml project page. On Unix, building the library is a matter of ./configure && make. On Windows, projects/solutions are provided for VC++ 9, 10, 11, and 12. There are also quite a few interesting examples inside the distribution.

ODB packages for RHEL and Fedora

February 26th, 2014

Just a quick note to let you know that ODB 2.3.0 packages for RHEL 5 and 6 as well as Fedora 19-21 are now available from the official repositories (EPEL and Fedora). See the mailing list announcement for all the technical details.

While this is great news and a major achievement for ODB (especially considering that there are 7 packages comprising the ODB system), what saddens me somewhat, as a Debian user myself, is that there are no such packages for Debian/Ubuntu yet. In fact, there is even no effort under way. All we have is a “wishlist” bug report for ODB.

At the same time I believe it should now be very straightforward to package ODB for Debian. The system was designed from the grounds up to be packager-friendly (that’s why it is split into multiple packages instead of being a one monolithic block). Also, the packaging efforts first for Gentoo and then for EL/Fedora ironed out a lot of kinks. So if anyone is interested in packaging ODB for Debian, let us know (post to the odb-users mailing list). I will personally assist you in any way I can, as I did in the past for the Gentoo and EL/Fedora packaging efforts.