[xsd-users] Non-xml data in attributes with xsd:string, xsd:normalizedString

Boris Kolpackov boris at codesynthesis.com
Thu Sep 24 09:22:49 EDT 2009


Hi Bill,

Bill Pringlemeir <bpringle at sympatico.ca> writes:

> On 26 Jun 2009, boris at codesynthesis.com wrote:
>
> > 1. The encoding in the object model. This is by default UTF-8.
> 
> > 2. The UTF-16 encoding used in Xerces-C++.
> 
> > 3. The encoding of the resulting XML document. This can be specified
> > in the serialization function.
> 
> 
> In layer '1.' above is UTF-8 only due to xsd::cxx::xml::transcode() 
> implementation?

You need to specify some encoding for the text in object model. 
Otherwise it is not clear how to convert from/to UTF-16.


> This macro would allow a user to over-ride the base transcode() 
> behaviour.

I think it is a good idea to allow something like this. However,
it is not clear how to support such an overriding. Currently the
transcode and transcode_to_xmlch function are defined in libxsd
and used throughout the runtime and generated code. They are
function templates so the "global variable that contains the
pointer to the transcode function" approach won't work. The only
way that I can think of that will work is to completely remove
those definitions if, say XSD_USE_CUSTOM_ENCODING, is defined.
It will then be the user's responsibility to include the suitable
implementations at the beginning of the generated code using, for
example, the --hxx-prologue option. There are two problems with
this approach:

1. If some libxsd headers are included directly, one will need
   to remember to include the transcode definitions before that.

2. Element/attribute names in the generated code are in UTF-8.
   This probably won't be a big deal since 99.9% of such names
   are in ASCII. However, I have seen schema with enumeration 
   value that contain non-ASCII characters and this can be a
   problem for some schema.

Any thoughts?


> It is powerful to allow different encodings [between the model 
> and the resultant XML], but I don't think that a majority of XSD 
> users are using this?

So far there were a few requests to allow selecting ISO-Latin-1 
encoding instead of UTF-8 which we will probably implement for
the next release. But since we don't plan to support all possible
encodings, I am quite interested in a generic solution like the
above.

Boris




More information about the xsd-users mailing list