[xsd-users] xsd internationalization.

Boris Kolpackov boris at codesynthesis.com
Wed Jun 6 03:42:08 EDT 2012


Hi Dmitry,

Dmitry Chernov <diman4ik.chernov at gmail.com> writes:

> In our software we need to generate xml with cyrillic letters from
> cyrillic xsd. Maybe it would be reasonable to have two types of xsd:
> one plain latin and one cyrillic. People who are responsible for
> creation of xsd would make cyrillic, we would transliterate it with
> script. Than with xsd generate C++ classes from latin xsd. Than at
> runtime our software would serialize C++ objects in cyrillic xml. Is
> it possible to add such an oportunity to xsd lib?

Let me first clarify that we are talking about supporting Cyrillic
element/attribute/type names and not Cyrillic XML content (i.e.,
the values of elements/attributes). The latter works fine right
now. 
 
The problem with Cyrillic element/attribute/type names is that C++
only supports Latin names in the identifiers and so the XSD compiler
replaces all non-Latin characters with '_'. The result are C++ names
like 'cxx___', which are not very usable. See also this earlier thread
for more context:

http://www.codesynthesis.com/pipermail/xsd-users/2012-June/003653.html

Now, the type names in XSD normally don't get exposed in XML (xsi:type
is the exception) so those could be changed to Latin without any changes
to XML.

Element/attribute names, however, do end up in the resulting XML. Your
suggestion won't work because if you use the Latin version of your
schema to generate the C++ model, then the generated code will expect
Latin element/attribute names in XML. One way to work around this
would be to pre-process the DOM document before passing it on to
the XSD-generated code by changing all the Cyrillic element/attribute
names to the Latin ones (the same can also be done for serialization,
except in the other direction). This can actually be implemented right
now with a bit of effort.

The way to support this directly in XSD would be in way similar to
the --reserved-name option, which allows you to map any name that
is used as a C++ identifier to some alternative name:

--reserved-name <cyrillic>=<latin>

The reason why this approach doesn't work right now is because we
don't support anything except Latin characters in this option. Fixing
this in a portable way won't be trivial either. So we would rather
not go this way.

Another alternative would be to do something similar to what we have
done to support custom C++ literals (see the --custom-literals option).
Essentially, we would add another options, say --custom-identifiers,
with which you will be able to pass an XML file that contains mappings
between Cyrillic names and the Latin names that should be used instead
in the C++ code.

Boris



More information about the xsd-users mailing list