[xsd-users] Using a SAX parser in cxx-tree
Boris Kolpackov
boris at codesynthesis.com
Tue Aug 10 08:51:12 EDT 2010
Hi Ivan,
Ivan Le Lann <ivan.lelann at free.fr> writes:
> Unless I'm missing something, there is no way to dump an XML document
> into a "xsd cxx-tree" generated class without first creating a Xerces
> DOM document.
Yes, that's correct though the generated parsing functions do this
automatically.
> It seemed to me that cxx-tree users are paying at least twice their
> memory payload at object creation time.
Yes, that's true, during parsing there is a short period of time when
both the DOM document and the object model are in memory. The memory
requirements, however, can be alleviated by only parsing/serializing
a fragment of the object model at a time. For details, see the
'streaming' example in the examples/cxx/tree/ directory.
> And that cpu cycles could also be spared here.
Yes, some CPU resources are spent constructing the DOM document.
While there are drawbacks to first creating the DOM document, there
are also a number of features that are made possible due to this
choice. In particular:
1. It is possible to (optionally) maintain bi-directional association
between DOM nodes and object model nodes. This allows you to have
both statically-types and "untyped" view of the document with
the later being useful, for example, for generic traversal.
2. XPath support currently depends on the DOM association.
3. XML Schema wildcard (xs:any and xs:anyAttribute) content is
represented as DOM fragments.
4. The DOM representation can be used to determine the root
element of the document being parsed.
5. It is sometimes required to "touch up" the XML document being
parsed before passing it on to the object model as well as
the DOM document being serialized before saving it to XML.
For example, often legacy systems produce XML without the
required XML namespace declarations. One can use the
intermediate DOM representation to easily fix this.
> After a quick look a XSD source code,
> I'm willing to try and implement a SAX constructor for cxx-tree classes.
>
> Before starting, I'd like to know if this attempt is :
>
> 1) useless because already present in xsd
No, there is no support for SAX-based parsing in C++/Tree. However,
there is the C++/Hybrid mapping in XSD/e[1] which is, roughly speaking,
a light-weight version of C++/Tree and is based on SAX. You may want
to consider this first.
> 2) useless because it won't improve memory or speed for a reason I'm missing
I think it will improve both memory usage and speed.
> 3) long-awaited and welcome ! :)
You are definitely welcome to try to implement this. However, I think an
easier approach would be to first implement an "XML Reader" API on top of
Xerces-C++ SAX2 using the progressive mode feature (see the 'streaming'
example mention above for some ideas on how this might work). It will be
much easier to use compared to the callback-based SAX since you can use
it in a way very similar to how it is done now with DOM.
[1] http://www.codesynthesis.com/products/xsde/
Boris
More information about the xsd-users
mailing list