Introduction
This document describes the architecture of the XSDBench XML Schema benchmark. XML Schema definitions and XML instances that are used in the tests as well as the tests themselves are discussed.
Schemas and Instances
Most practical XML Schema definitions describe two aspects of the conforming XML instances: structure and content. The XML Schema specification itself is split into two parts along these lines: XML Schema Part 1: Structures and XML Schema Part 2: Datatypes.
While the complexity varies, all non-trivial applications have
structure in their XML instances. However, the degree of use of
content validation can vary dramatically. At the one extreme are
applications that do not use any built-in types except
xsd:string
. In such applications there is no content
validation at all. There are quite a few applications that only
use a handful of build-in types, mostly strings and numbers.
The content validation overhead in such applications is minimal
compared to the structure validation. At the other extreme are
applications that make extensive use of the relatively expensive
content validation features such as regular expressions and
enumerations.
To be useful, this benchmark will provide two sets of tests: the first set measures structure validation and the second set measures content validation. Application developers can then use the results to predict the performance of their application by taking into account the usage of content versus structure in their schemas.
We expect that in most applications the structure validation overhead will greatly outweigh that of the content validation. We therefore start with the structure validation tests in this version of the benchmark and plan to add the content validation tests in the upcoming releases.
The structure.xsd
schema
tests the following commonly-used XML Schema constructs:
attribute
anyAttribute
element
any
all
choice
sequence
- complex type empty content, including extension and restriction
- complex type simple content, including extension and restriction
- complex type complex content, including extension and restriction
The test instance is in structure.xml
.
Tests
The tests measure the time it takes to parse and validate the given XML instance. The APIs used, in the order of preference, are SAX (push), pull, and DOM. In other words, if the parser supports SAX and supports XML Schema validation with this API, then SAX is used, otherwise the next API is considered. The following table indicates the API used for each parser:
Parser | API |
---|---|
Xerces-C++ | SAX |
CodeSynthesis XSD | SAX-like C++/Parser |
Libxml2 | Pull-style |
MSXML | SAX |
Oracle XDK | DOM |
Each parser test performs the following steps:
- Read XML instance into memory
- Create a parser
- Load and cache XML Schema in the parser
- Start timer
- Perform 10000 parse iterations, resetting the parser if necessary
- Stop timer
- Destroy the parser