[xsd-users] Non-xml data in attributes with xsd:string,
xsd:normalizedString
Bill Pringlemeir
bpringle at sympatico.ca
Tue Sep 22 23:27:28 EDT 2009
On 26 Jun 2009, boris at codesynthesis.com wrote:
> Bill Pringlemeir <bpringle at sympatico.ca> writes:
>> I guess that Xerces mandates a UTF-? encoding. Does Xereces then
>> convert to the specified encoding, for instance 'US-ASCII'.
> There are three encoding at play here, the second is normally not
> visible to the end-user, except for some situations:
> 1. The encoding in the object model. This is by default UTF-8.
> 2. The UTF-16 encoding used in Xerces-C++.
> 3. The encoding of the resulting XML document. This can be specified
> in the serialization function.
> Conversion between (1) and (2) is performed by the object model,
> between (2) and (3) -- by Xerces-C++.
[snip]
> I think the simplest way would be to use XSD_USE_LCP and set the
> local code page to US-ASCII. But the availability of this approach
> depends on the OS(es) you are targeting. Otherwise you will need
> to make sure your binary data is properly UTF-8-encoded (i.e.,
> characters above 0x7F are replaced with two-byte sequences).
The XSD_USE_LCP requires setting locales which effects the rest of the
application. In layer '1.' above is UTF-8 only due to
xsd::cxx::xml::transcode() implementation? For my case, I rewrote
this method to simply copy low bit items and escape high bits. There
is no need for any exceptions. A 'XSD_USER_TRANSCODE' would be more
useful than the XSD_USE_LCP (which is for historical reasons?). This
macro would allow a user to over-ride the base transcode() behaviour.
That would work on both Windows and *nix and would allow a user to
implement arbitrary conversion from std::string (whatever model the
user has) and the layer 2. The layer 1 *seems* contrived to me. Is
there some sort of tag matching, etc that XSD is doing that would care
about the character encoding of data in the model?
Of course UTF-8 is a good choice, but it is probably often the case
that the user wishes layer 1 and 3 to be the same. It is powerful to
allow different encodings [between the model and the resultant XML],
but I don't think that a majority of XSD users are using this?
Regards,
Bill Pringlemeir.
--
Keep things as simple as possible, but no simpler. - A. Einstein
More information about the xsd-users
mailing list