[xsd-users] Using a SAX parser in cxx-tree

Boris Kolpackov boris at codesynthesis.com
Tue Aug 10 08:51:12 EDT 2010


Hi Ivan,

Ivan Le Lann <ivan.lelann at free.fr> writes:

> Unless I'm missing something, there is no way to dump an XML document
> into a "xsd cxx-tree" generated class without first creating a Xerces 
> DOM document.

Yes, that's correct though the generated parsing functions do this 
automatically.


> It seemed to me that cxx-tree users are paying at least twice their 
> memory payload at object creation time. 

Yes, that's true, during parsing there is a short period of time when 
both the DOM document and the object model are in memory. The memory
requirements, however, can be alleviated by only parsing/serializing
a fragment of the object model at a time. For details, see the
'streaming' example in the examples/cxx/tree/ directory.


> And that cpu cycles could also be spared here.

Yes, some CPU resources are spent constructing the DOM document.

While there are drawbacks to first creating the DOM document, there
are also a number of features that are made possible due to this
choice. In particular:

1. It is possible to (optionally) maintain bi-directional association
   between DOM nodes and object model nodes. This allows you to have
   both statically-types and "untyped" view of the document with
   the later being useful, for example, for generic traversal.

2. XPath support currently depends on the DOM association.

3. XML Schema wildcard (xs:any and xs:anyAttribute) content is
   represented as DOM fragments.

4. The DOM representation can be used to determine the root
   element of the document being parsed.

5. It is sometimes required to "touch up" the XML document being
   parsed before passing it on to the object model as well as 
   the DOM document being serialized before saving it to XML.
   For example, often legacy systems produce XML without the
   required XML namespace declarations. One can use the 
   intermediate DOM representation to easily fix this.


> After a quick look a XSD source code,
> I'm willing to try and implement a SAX constructor for cxx-tree classes.
> 
> Before starting, I'd like to know if this attempt is :
> 
> 1) useless because already present in xsd

No, there is no support for SAX-based parsing in C++/Tree. However,
there is the C++/Hybrid mapping in XSD/e[1] which is, roughly speaking,
a light-weight version of C++/Tree and is based on SAX. You may want
to consider this first.


> 2) useless because it won't improve memory or speed for a reason I'm missing

I think it will improve both memory usage and speed.


> 3) long-awaited and welcome !  :)

You are definitely welcome to try to implement this. However, I think an
easier approach would be to first implement an "XML Reader" API on top of
Xerces-C++ SAX2 using the progressive mode feature (see the 'streaming'
example mention above for some ideas on how this might work). It will be
much easier to use compared to the callback-based SAX since you can use
it in a way very similar to how it is done now with DOM.

[1] http://www.codesynthesis.com/products/xsde/

Boris



More information about the xsd-users mailing list