[xsd-users] Keep a xsd library small

Fri Feb 13 08:56:49 EST 2009

Hi Angelo,

Angelo Difino <angelo at cedeo.net> writes:

> first of all I'd like to say that xsd-codesynthesis rocks: after just few
> days of it i was able to create the c++ classes of a quite 'complicated'
> set of  schemas (that was already posted here few mouths ago').

Thanks, I am glad you find it useful.

> Since this set of schema suffer of cyclic dependencies with inheritance,
> I'm using the file-per-type option. I'm running the last release (3.2.0) 
> of XSD on win32/vista...
>
> Everything works great, but the problem is the huge amount of files
> and its size when I try to compile it (i'm using MSVisual c++ 2003).
> The header and class file counts 872 and the built library is of 1GB
> of size 

You are probably building a static library. Unfortunately with 
the file-per-type mode static libraries for non-trivial schemas
are bound to be quite large. This is due to the large number of
source files which, when compiled, all include instantiations of
some common templates (this is especialy true when the --generate-
polymorphic option is used). The static library is pretty much the
archive of all the object files. When it is linked to an executable,
the linker will remove all those duplicate template instantiations
so the resulting binary will have the same size as if you used the
file-per-schema mode.

One solution to this problem is to use a shared library (DLL)
instead of a static library since a shared library is mode like
an executable in that all the duplicate template instantiations
are removed. For example, I compiled your schemas on my GNU/Linux
box and while the static library is 227MB, the shared library is
10MB.

Here are some more tips for reducing the size/compilation time:

1. Specify root element with the --root-element option. In your
   schema I eliminated about a hundred parsing/serialization
   functions by adding '--root-element DIDL' to the command
   line.

2. Your schema is composed of several lower-level schema subsets.
   If only one of the lower-level subsets involve the cyclic 
   dependency, then you can compile only this subset in the
   file-per-type mode and the rest in the default, file-per-
   schema mode. Note that here the subset needs to be fairly
   isolated in that all the schemas that it includes/imports
   will be handled in the file-per-type mode. In your case,
   there are two subsets that involve cyclic dependencies 
   (rel-*.xsd and ipmpmsg.xsd/ipmpinfo.xsd) so the bulk of
   the schema has to be compiled in the file-per-type mode.
   However, there are still a few files that can be compiled
   in the file-per-schema mode, namely, didl.xsd, didl-msx.xsd,
   and mpeg4smp.xsd.

   I compiled your schemas like so:

   xsd cxx-tree --file-per-type ... rel-r.xsd ipmpinfo.xsd
   xsd cxx-tree ... didl.xsd didl-msx.xsd mpeg4smp.xsd 

   And the static library size went down to 175Mb.

3. You can also try to split the offending schemas into two or
   more files so that they don't involve cyclic dependencies
   with inheritance (the resulting schema will be semantically
   equivalent to the original). Then you can use the file-per-
   schema mode.

4. When using the file-per-type mode it is recommended to use
   precompiled headers to speed-up compilation. You would normally
   include xml-schema.hxx into the precompiled header and then
   include the precompiled header into each generated source
   file using the --cxx-prologue option.

Boris