A Sense of Design

Archive for the ‘XML’ Category

CodeSynthesis XSD 4.0.0 Released

Tuesday, July 22nd, 2014

CodeSynthesis XSD 4.0.0 was released today.

In case you are not familiar with XSD, it is an open-source, cross-platform XML Schema to C++ data binding compiler. Provided with an XML instance specification (XML Schema), it generates C++ classes that represent the given vocabulary as well as XML parsing and serialization code. You can then access the data stored in XML using types and functions that semantically correspond to your application domain rather than dealing with the intricacies of reading and writing raw XML.

For an exhaustive list of new features see the official announcement. Below I am going to cover the notable new features in more detail and include some insight into what motivated their addition.

Ok, that was a major new release. So what are the major changes and new features? Well, firstly, we removed quite a bit of “outdated backwards-compatibility baggage”, such as support for Xerces-C++ 2-series (2.8.0) or Visual Studio 2003 (7.1). At the same time, the good news is there aren’t any changes that will break existing code. What has changed a lot are the compiler internals, and, especially, dependencies which will make building XSD from source much easier.

While removing old stuff we also added support for new C++ compilers that popped up since the last release. XSD now supports Clang as well as Visual Studio 2012 (11.0) and 2013 (12.0).

Ok, let’s now examine the major new features. The biggest is support for C++11 (the --std c++11 option). While there are many little changes in the generated code when this mode is enabled, the major two are the reliance on the move semantics and the use of std::unique_ptr instead of deprecated std::auto_ptr.

Another big feature in this release is support for ordered types. XSD flattens nested XML Schema compositors to give us a nice and simple C++ API. This works very well in most cases, especially for more complex schemas. Sometimes, however, this can lead to the loss of relative element ordering that can be semantically important to the application (the “unordered choice” XML Schema idiom). Now you can mark such types as ordered which makes XSD generate an additional order tracking API. So now you can have the best of both worlds: nice and simple API in most cases and additional order information in a few places where the simple API is not enough.

Once we had this implemented, another stubbornly annoying feature, mixed content, got sorted out. The problem with mixed content is that the text fragments can appear interleaved with elements in pretty much any order. Extracting the text is easy, it is preserving the order information relative to the elements, that’s the tricky part. But now we have the perfect mechanism for that. One user who was beta-testing this feature said: “I read the new documentation and I’m impressed.”

You can read more on ordered types in Section 2.8.4, “Element Order” and on mixed content in Section 2.13, “Mapping for Mixed Content Models” in the C++/Tree Mapping User Manual.

Another problem that is somewhat similar to mixed content is access to data represented by xs:anyType and xs:anySimpleType XML Schema types. anyType allows any content in any order. You can think of its definition as a complex type with mixed content that has an element wildcard that allows any elements and an attribute wildcard that allows any attributes. In other words, anything goes. XSD already can represent wildcard content as raw DOM fragments so it was only natural to extend this support to anyType content. Similar to anyType, anySimpleType allows any simple content, that is, any text (pretty similar to xs:string in that sense). Now it is possible to get anySimpleType content as a text string.

For more information on this new feature see Section 2.5.2, “Mapping for anyType” and Section 2.5.3, “Mapping for anySimpleType” in the C++/Tree Mapping User Manual.

Another cool feature in XSD is the stream-oriented, partially in-memory XML processing that allows parsing and serialization of XML documents in chunks. This allows us to process parts of the document as they become available as well as handle documents that are too large to fit into memory. XSD comes with an example, called streaming, that shows how to set all this up. In this release this example has been significantly improved. It now has much better XML namespace handling and allows streaming at multiple document levels. This turned out to be really useful for handling large and complex documents such as GML/CityGML.

Last but not least, those of us who still prefer to write our own makefiles will be happy to know XSD now supports automatic make-style dependency generation, similar to the GCC’s -M* functionality but just with sane option names. See the XSD Compiler Command Line Manual (man pages) for details.

Posted in XML, C++ | Comments Off

libstudxml – modern XML API for C++

Tuesday, May 20th, 2014

My talk at this year’s C++Now was about an XML API for modern C++. An API that I believe should have already been in Boost or even in the C++ standard library. Presenting an API without an implementation would be rather lame, so during my talk I also announced libstudxml, which is an open source (MIT) compact, external dependency-free, and reasonably efficient XML library for modern, standard C++. In other words, a library that you can use in pretty much any project and on any platform without much fuss.

A piece of code is worth a thousand words, so let me give you a taste of the API. For this XML:

<person id="123">
  <name>John Doe</name>
  <age>23</age>
  <gender>male</gender>
</person>

The parsing code could look like this:

enum class gender {...};
 
ifstream ifs (argv[1]);
parser p (ifs, argv[1]);
 
p.next_expect (parser::start_element, "person", content::complex);
 
long id = p.attribute<long> ("id");
 
string n = p.element ("name");
short a = p.element<short> ("age");
gender g = p.element<gender> ("gender");
 
p.next_expect (parser::end_element); // person

And that’s with all the validation necessary for this XML vocabulary. But I don’t see any exceptions being thrown, you might say. And that’s exactly the point. Here is the list of interesting features this API has:

Streaming pull parser and streaming serializer
Two-level API: minimum overhead low-level & more convenient high-level
Content model-aware (empty, simple, complex, mixed)
Whitespace processing based on content model
Validation based on content model
Validation of missing/extra attributes
Validation of unexpected events (elements, etc)
Data extraction to value types
Attribute map with extended lifetime (high-level API)

The XML parser in libstudxml is a conforming, non-validating XML 1.0 implementation that is based on tested and proven code (see Implementation Notes for details). A lot of people ask me why not use one of the new, claimed to be super fast and/or compact XML libraries for C++ that are already out there (RapidXML, PugiXML, TinyXML, etc)? The main reason is that they are not real, as in conforming, XML parsers. I discuss why you should stick to real XML parsers in my talk. Hopefully the videos will be posted soon.

Interested? For more information on the API you can jump directly to the Introduction which shows a lot of examples. Or you can grab and build the source code distribution from the libstudxml project page. On Unix, building the library is a matter of ./configure && make. On Windows, projects/solutions are provided for VC++ 9, 10, 11, and 12. There are also quite a few interesting examples inside the distribution.

Posted in XML, C++ | 4 Comments »

XSD/e 3.2.0 released

Wednesday, February 16th, 2011

XSD/e 3.2.0 was released yesterday. In case you are not familiar with XSD/e, it is a dependency-free XML Schema to C++ compiler for mobile, embedded, and light-weight C++ applications. It provides XML parsing, serialization, XML Schema validation and XML data binding while maintaining a small footprint and portability.

This version includes a number of major new features (examined in more detail next), small improvements, and bug fixes. It also adds official support, instructions, and sample configuration files for the following platforms and toolchains:

Android/Android NDK
Symbian/CSL-GCC (GCCE)
Integrity 178b/Green Hills MULTI C/C++

It is now also possible to build the XSD/e runtime library for iPhoneOS/iOS with the XCode project.

Enum mapping

Probably the most visible new feature is the mapping of XML Schema enumerations to C++ enum. Consider, for example, the following schema fragment:

<simpleType name="genre">
  <restriction base="string">
    <enumeration value="romance"/>
    <enumeration value="fiction"/>
    <enumeration value="horror"/>
    <enumeration value="history"/>
    <enumeration value="philosophy"/>
  </restriction>
</simpleType>

The new interface for the genre C++ class will look like this:

class genre
{
public:
  enum value_type
  {
    romance,
    fiction,
    horror,
    history,
    philosophy
  };
 
  genre ();
  genre (value_type);
 
  void value (value_type);
  operator value_type () const
  const char* string () const;
};

And we can use this class like this:

genre g (genre::fiction);
 
if (g != genre::philosophy)
  cout << g.string () << endl;

Memory allocators

You can now configure the XSD/e runtime and generated code to perform memory management using custom memory allocator functions provided by your application instead of the standard operator new/delete. The allocator example provided with the XSD/e distribution uses this feature to completely eliminate dynamic memory allocations. In particular, the resulting object model is placed into a memory block allocated on the stack. Cool, huh?

Schema evolution

Often new features that are added to the application require changes to the corresponding schemas. In mobile and embedded systems this poses a significant challenge since it is often impossible to upgrade the devices that are out in the field. As a result, it is a common requirement that devices running older versions of the software be able to cope with data that is based on the newer versions of the schema. In this release XSD/e adds support for such schema evolution using the substitution groups mechanism. Both the ignore and passthrough models for unknown content are supported.

Configurable character encoding

It is now possible to configure the character encoding used in the application with the currently supported options being UTF-8 and ISO-8859-1. Note that this encoding is not the same as the XML document encoding that is being parsed or serialized. Rather, it is the encoding that is used inside the application. When an XML document is parsed, the character data is automatically converted to the application encoding. Similarly, when an XML document is serialized, the data in the application encoding is automatically converted to the resulting document encoding.

There are also other important features in this release, including the generation of clone functions for variable-length types, improved support for XML Schema facet validation, including xs:pattern, etc. For an exhaustive list of new features see the official XSD/e 3.2.0 release announcement.

Posted in Mobile/Embedded, XML, C++ | Comments Off