Xerces-C++ 3.0.0 Released
Quite a few people believed this will never happen but after many years of development Xerces-C++ 3.0.0 is finally out. This major release includes a large number of new features, bug fixes, and clean-ups. It also happens to break a few interfaces (especially in DOM) so application adjustments may be required. For the complete list of changes in this version refer to the official announcement on the project’s mailing lists. In this post I am going to cover some of the major improvements in more detail.
As with 2.8.0, this release comes with a wide range of precompiled libraries (total 17) for various CPU architectures, operating systems, and C++ compilers. For most platforms 32-bit and 64-bit variants are provided. Note also that while the libraries are built using specific C++ compiler versions, most of them will also work with newer versions of the same compilers. For example, libraries built with GCC 3.4.x will also work with GCC 4.x.y. Similarly, libraries built with Sun C++ 5.7 (Studio 10) will work with Sun C++ 5.8.
The first thing GNU/Linux, UNIX, and Mac OS X users will notice is the new, automake-based build system for these platforms. There is no more XERCESCROOT
or runConfigure
and all the standard configure
options are supported. There is also a number of options specific to Xerces-C++ which can all be viewed by executing configure --help
. For Windows users the distribution comes with VC++ project files. In this release a set for VC++ 9.0 (2008) was added. Additionally, project files for VC++ 7.1, 8.0, and 9.0 now include targets to build Xerces-C++ with the ICU library as a character transcoder.
Other infrastructure work includes the removal of deprecated components (DepDOM, COM) as well as project files for unmaintained compilers. The documentation was cleaned-up and split into the website and library categories with the Xerces-C++ distributions now only including the library documentation (build instructions, programming guides, etc). Overall, I believe all of this will get the Xerces-C++ project back on the regular release track with the next release (3.1.0) in about a year.
Now to the new functionality in the library itself. The Xerces-C++ component that got the most work in this release is probably XML Schema. It includes a large number of bug fixes and errata changes. In particular the long-standing bug that resulted in long execution times and stack overflows on schemas with large minOccurs
and maxOccurs
values has been fixed. Also the new interpretation of the ##other
namespace designator has been implemented. Related but not limited to XML Schema is the work done to review and clean-up all the diagnostics messages issued by Xerces-C++. They were all clarified and now consistently start with a lower-case letter and do not include a period at the end.
Prior to the 3.0.0 release Xerces-C++ included the draft DOM XPath 1 interfaces that were barely usable and required a lot of casting to the implementation when used with XPath 2 processors such as XQilla. In 3.0.0, the DOM XPath interfaces were extended to support both XPath 1 and XPath 2 data models. As a result, the application can now depend only on interfaces. The 2.2.0 release of XQilla, due in a few weeks, will include support for Xerces-C++ 3.0.0. Furthermore, the 3.0.0 release implements the XML Schema subset of XPath 1 in DOM. This allows you to execute basic XPath queries without requiring a separate XPath processor library.
Another major change in Xerces-C++ 3.0.0 is the porting of all public interfaces and a major part of the implementation to use 64-bit safe types. This means that if you design your application to be 64-bit safe (e.g., use std::size_t
for indexes, lengths, etc.), then you don’t need to perform any casts when interfacing with Xerces-C++.
Finally, a number performance-critical parts were optimized for speed in this release. This resulted, for example, in both DOM parsing and XML Schema validation showing about 25%-30% improvement compared to 2.8.0.
When I first started working on the 3.0.0 code base it was in quite a mess with the automake-based build system still unfinished and having most of the source code 64-bit ignorant. At that point I decided that we will need to maintain both 2.8.0 and 3.0.0 in parallel in case the 3.0.0 release happens to be a disaster. This is why the Xerces-C++ project website now includes two sections, one for 2.8.0 and one for 3.0.0. As a release manager, my primary goals for Xerces-C++ 3.0.0 became to make it cleaner, easier to build, better tested, as well as to provide better XML Schema support. Two betas later and I think 3.0.0 came out to be a very solid release, better than 2.8.0 in every aspect and, in retrospect, making that website split probably unnecessary. In fact, we were confident enough to build all our XSD 3.2.0 binary distributions with Xerces-C++ 3.0.0.