[xsd-users] dealing with xml written/read on-the-fly

Thu Oct 22 09:32:01 EDT 2009

Hi Bill,

Bill Pringlemeir <bpringle at sympatico.ca> writes:

> You can use a SAX parser (in Xerces as well) to parse available text.
> When a known top level chunk tag is recognized, it can then be passed
> to a normal XSD de-serializer.  The overhead for SAX parsing of tags
> is fairly minimal.  Also, if you have a partial buffer, you would have
> to keep the 'partial' XSD data model around to restart when more text
> was found.  So having to keep the Valgrind text until SAX recognizes a
> completion tag isn't really the worst overhead.  Having XSD (or
> whatever solution) keep the entire DOM tree, a data model, and
> possibly the text is definitely worse.
> 
> The only objection I see is that XSD should provide a mode to not
> attempt to read further.  I think you could provide your own stream
> and throw an exception if it reads beyond the known size.  That should
> be a sanity check as opposed to a normal mode of operation.  You might
> have to create some XML text that doesn't match the schema to test
> this.

The approach shown in the streaming example[1] is much better. It uses
the little-known Xerces-C++ feature called progressive parsing. In this
mode Xerces-C++ returns control to the caller after parsing one token in
the XML document. The streaming example uses SAX2 in the progressive mode
to build and return one DOM chunk corresponding to one second level element
at a time. This DOM fragment is then parsed in to the object model fragment
and can be processed and discarded or added to the object model.

[1] http://www.codesynthesis.com/~boris/tmp/xsd-3.2.0-streaming.tar.gz
    http://www.codesynthesis.com/~boris/tmp/xsd-3.2.0-streaming.zip

Boris