[xsd-users] Re: Serializer prints xmlns attribute of new elements after emitting thousands of them

Boris Kolpackov boris at codesynthesis.com
Tue Mar 10 11:36:03 EDT 2015


Hi Yury,

Yury Zaytsev <yury.zaytsev at traveltainment.de> writes:

> It looks like I've managed to reproduce this surprising behavior by making 
> minor modifications to the standard streaming example shipped with the 
> latest version of XSD; please find the diff showing the necessary changes, 
> as well as a sample output attached. I've added the loop only to emit a 
> just a bit over the required magic number of elements; on my machine the 
> threshold seems to be ~500 items.

Thanks for the test case! I've managed to reproduce the problem and
did some digging. The problem is with the clearing of the DOM document
(as it happens, after every 500 elements) in order to force the release
of memory. Here is the background:

Xerces-C++ DOM tries to re-use memory blocks that were allocated and
released in the same document but it doesn't do it for every kind of
block. As a result, the memory used by the document will keep growing
if you keep allocating and deallocating elements, which is exactly
what we do in the streaming serializer. To work around this the
serializer frees the document and allocates a new one after every
500 element create/release cycles.

The problem is, the Xerces-C++ serializer stores shallow copies of
strings when it builds the namespace map during serialization. If
we re-create the document mid-serialization, all the entries in
this map become invalid since they are from the string pool of the
old document. That's the reason for those spurious xml namespace
declarations.

So far, the best way to fix this that I can think of is to re-create
the namespace map when we are re-creating the document. But I am
going to think a bit more to see if there is a better way.

Boris



More information about the xsd-users mailing list