Introduction

This document describes the architecture of the XSDBench XML Schema benchmark. XML Schema definitions and XML instances that are used in the tests as well as the tests themselves are discussed.

Schemas and Instances

Most practical XML Schema definitions describe two aspects of the conforming XML instances: structure and content. The XML Schema specification itself is split into two parts along these lines: XML Schema Part 1: Structures and XML Schema Part 2: Datatypes.

While the complexity varies, all non-trivial applications have structure in their XML instances. However, the degree of use of content validation can vary dramatically. At the one extreme are applications that do not use any built-in types except xsd:string. In such applications there is no content validation at all. There are quite a few applications that only use a handful of build-in types, mostly strings and numbers. The content validation overhead in such applications is minimal compared to the structure validation. At the other extreme are applications that make extensive use of the relatively expensive content validation features such as regular expressions and enumerations.

To be useful, this benchmark will provide two sets of tests: the first set measures structure validation and the second set measures content validation. Application developers can then use the results to predict the performance of their application by taking into account the usage of content versus structure in their schemas.

We expect that in most applications the structure validation overhead will greatly outweigh that of the content validation. We therefore start with the structure validation tests in this version of the benchmark and plan to add the content validation tests in the upcoming releases.

The structure.xsd schema tests the following commonly-used XML Schema constructs:

The test instance is in structure.xml.

Tests

The tests measure the time it takes to parse and validate the given XML instance. The APIs used, in the order of preference, are SAX (push), pull, and DOM. In other words, if the parser supports SAX and supports XML Schema validation with this API, then SAX is used, otherwise the next API is considered. The following table indicates the API used for each parser:

Parser API
Xerces-C++ SAX
CodeSynthesis XSD SAX-like C++/Parser
Libxml2 Pull-style
MSXML SAX
Oracle XDK DOM

Each parser test performs the following steps:

  1. Read XML instance into memory
  2. Create a parser
  3. Load and cache XML Schema in the parser
  4. Start timer
  5. Perform 10000 parse iterations, resetting the parser if necessary
  6. Stop timer
  7. Destroy the parser