[xsd-users] Parsing and serializing large documents

Boris Kolpackov boris at codesynthesis.com
Mon Feb 16 08:32:42 EST 2009


Hi Anatoly,

Anatoly Borodyansky <aborodya at yahoo.com> writes:

> If I need to parse a large XML file, adjust some elements, and 
> serialize the whole structure again, what is the best way to do 
> it (it works beautifully with Tree, but my input Is too large
> for it to handle)

In XSD, currently, the only way is to use the C++/Parser mapping
for parsing and perform custom XML serialization (you can either
do it by hand if the XML vocabulary is quite simple or use an XML
serializer such as GenX[1], libxml2, etc.) as your parse the input
stream.

In XSD/e[2], our mobile/embedded systems version of XSD, we have
the C++/Serializer mapping which is a counterpart to the C++/Parser
mapping in that it provides event-driven, stream-oriented XML
serialization. You can certainly use XSD/e on general-purpose
platforms as well.

The only problem with using the C++/Parser and C++/Serializer
mappings to perform the transformation you have described is
that it is not easy to "pipe-line" them because they both are
callback-based. One way to overcome this would be to have two
threads with the first performing parsing and sending the modified
data to the second thread which performs serialization (it is
also possible to achieve the same with parser suspension though
it is more complex).

There is also a third mapping, C++/Hybrid, which can greatly
simplify passing of the data around since it provides a light-
weight object model on top of C++/Parser and C++/Serializer.
The 'streaming' example in the examples/cxx/hybrid/ directory
in the XSD/e distribution shows how to performs partially in-
memory, partially event-driven XML processing (that is, you get
the convenience of C++/Tree but handle the document in chunks).

[1] http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus
[2] http://www.codesynthesis.com/products/xsde/

Boris




More information about the xsd-users mailing list