[xsd-users] customising XSD to use ICU

Boris Kolpackov boris at codesynthesis.com
Mon Jan 8 02:41:29 EST 2007


Hi Bradley,

Some quick notes before you send the code.

Bradley Beddoes <beddoes at intient.com> writes:

> At the moment I am attempting to customize XSD generated output to take
> advantage of the ICU library from (http://icu.sourceforge.net) to ensure
> we have true unicode support cross platform and aren't relying on the
> terrible wchar_t.

What is so terrible about wchar_t? It is true it can be 2 bytes long
(e.g., Windows) or 4 bytes (most UNIXes). XSD detects the size of
wchar_t and uses UTF-16 for 2-byte wchar_t and UTF-32/UCS-4 for
4-byte ones. If you don't search/test for characters outside the
Basic Plane (those that don't require 4-byte encoding in UTF-16)
then you should be fine. You can also write a small wrapper for
ICU if you do need to work with chars outside of the Basic Plane.

Another alternative would be to use char with UTF-8 encoding.


> In particular at the moment I am redefining xsd:string to be represented
> by UnicodeString (
> http://icu.sourceforge.net/apiref/icu4c/classUnicodeString.html ), I may
> look at UDate amongst others as well.

This is not going to be easy. The XSD runtime and generated code assume
an std::basic_string-based string and use string literals (e.g., "foo",
L"foo"). The best you could probably do is to customize all (or most)
of the user-visible API to use ICU UnicodeString but still use a char
(UTF-8) or wchar_t(UTF-16/32) -based encoding in the runtime, which may
not be too bad actually. You will probably need to derive from
UnicodeString and provide some constructors to allow implicit
construction from std::basic_string and string literals.

Another alternative would be to use std::basic_string with ICU Unicode
character type by using --char-type option (you will need to specialize
std::char_traits for this type). There could still be issues with
character literals though.


> Firstly xml:lang as it seems simple enough, I can't seem to work out how
> to customize that type "--custom-type lang" does not seem to provide
> anything in my generated header at all, it seems to be generated as a
> struct internally to for example localizedURIType in the below schema.

Are you using --morph-anonymous option? You will need to compile xml.xsd
with this option and --custom-type lang. The result will be a forward
declaration of struct lang; you will have to provide custom implementation.


> We regards to enums these appear to have the same constructor problem as
> noted about (use of basic_string<char>). Additionally I seem to be
> missing a non equivalence operator and I can't seem to figure out
> exactly what I need to implement (UnicodeString defines this method so
> its not that).

The problem with enums is that they use std::basic_string internally
as well as an array of string literals for enumerators. The only way
to overcome this (without using one of the strategies outlines above)
is to completely customize every string-based enum.

hth,
-boris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20070108/4f457fee/attachment.pgp


More information about the xsd-users mailing list