[xsd-users] Non-xml data in attributes with xsd:string, xsd:normalizedString

Bill Pringlemeir bpringle at sympatico.ca
Tue Sep 22 23:27:28 EDT 2009


On 26 Jun 2009, boris at codesynthesis.com wrote:

> Bill Pringlemeir <bpringle at sympatico.ca> writes:

>> I guess that Xerces mandates a UTF-? encoding.  Does Xereces then
>> convert to the specified encoding, for instance 'US-ASCII'.

> There are three encoding at play here, the second is normally not 
> visible to the end-user, except for some situations:

> 1. The encoding in the object model. This is by default UTF-8.

> 2. The UTF-16 encoding used in Xerces-C++.

> 3. The encoding of the resulting XML document. This can be specified
> in the serialization function.

> Conversion between (1) and (2) is performed by the object model,
> between (2) and (3) -- by Xerces-C++.

[snip]

> I think the simplest way would be to use XSD_USE_LCP and set the
> local code page to US-ASCII. But the availability of this approach
> depends on the OS(es) you are targeting. Otherwise you will need
> to make sure your binary data is properly UTF-8-encoded (i.e.,
> characters above 0x7F are replaced with two-byte sequences). 

The XSD_USE_LCP requires setting locales which effects the rest of the
application.  In layer '1.' above is UTF-8 only due to
xsd::cxx::xml::transcode() implementation?  For my case, I rewrote
this method to simply copy low bit items and escape high bits.  There
is no need for any exceptions.  A 'XSD_USER_TRANSCODE' would be more
useful than the XSD_USE_LCP (which is for historical reasons?).  This
macro would allow a user to over-ride the base transcode() behaviour.

That would work on both Windows and *nix and would allow a user to
implement arbitrary conversion from std::string (whatever model the
user has) and the layer 2.  The layer 1 *seems* contrived to me.  Is
there some sort of tag matching, etc that XSD is doing that would care
about the character encoding of data in the model?

Of course UTF-8 is a good choice, but it is probably often the case
that the user wishes layer 1 and 3 to be the same.  It is powerful to
allow different encodings [between the model and the resultant XML],
but I don't think that a majority of XSD users are using this?

Regards,
Bill Pringlemeir.

-- 
Keep things as simple as possible, but no simpler.  - A. Einstein




More information about the xsd-users mailing list