Archive for May, 2014

libstudxml – modern XML API for C++

Tuesday, May 20th, 2014

My talk at this year’s C++Now was about an XML API for modern C++. An API that I believe should have already been in Boost or even in the C++ standard library. Presenting an API without an implementation would be rather lame, so during my talk I also announced libstudxml, which is an open source (MIT) compact, external dependency-free, and reasonably efficient XML library for modern, standard C++. In other words, a library that you can use in pretty much any project and on any platform without much fuss.

A piece of code is worth a thousand words, so let me give you a taste of the API. For this XML:

<person id="123">
  <name>John Doe</name>
  <age>23</age>
  <gender>male</gender>
</person>

The parsing code could look like this:

enum class gender {...};
 
ifstream ifs (argv[1]);
parser p (ifs, argv[1]);
 
p.next_expect (parser::start_element, "person", content::complex);
 
long id = p.attribute<long> ("id");
 
string n = p.element ("name");
short a = p.element<short> ("age");
gender g = p.element<gender> ("gender");
 
p.next_expect (parser::end_element); // person

And that’s with all the validation necessary for this XML vocabulary. But I don’t see any exceptions being thrown, you might say. And that’s exactly the point. Here is the list of interesting features this API has:

  • Streaming pull parser and streaming serializer
  • Two-level API: minimum overhead low-level & more convenient high-level
  • Content model-aware (empty, simple, complex, mixed)
  • Whitespace processing based on content model
  • Validation based on content model
  • Validation of missing/extra attributes
  • Validation of unexpected events (elements, etc)
  • Data extraction to value types
  • Attribute map with extended lifetime (high-level API)

The XML parser in libstudxml is a conforming, non-validating XML 1.0 implementation that is based on tested and proven code (see Implementation Notes for details). A lot of people ask me why not use one of the new, claimed to be super fast and/or compact XML libraries for C++ that are already out there (RapidXML, PugiXML, TinyXML, etc)? The main reason is that they are not real, as in conforming, XML parsers. I discuss why you should stick to real XML parsers in my talk. Hopefully the videos will be posted soon.

Interested? For more information on the API you can jump directly to the Introduction which shows a lot of examples. Or you can grab and build the source code distribution from the libstudxml project page. On Unix, building the library is a matter of ./configure && make. On Windows, projects/solutions are provided for VC++ 9, 10, 11, and 12. There are also quite a few interesting examples inside the distribution.