[xsd-users] Non-xml data in attributes with xsd:string,
xsd:normalizedString
Bill Pringlemeir
bpringle at sympatico.ca
Wed Feb 18 21:07:13 EST 2009
I believe this problem is generic to C++ tree and not specific to any
architecture [I am using RHEL 4.x, AIX 5.x, and debian lenny on
(x86_64, Power3/4 and northwood chips)].
We are using data that is 99.9999% printable. However, there are
occasional characters that violate the attribute values. For example,
'tabs', 'lf', 'cr', etc. I see that the normalizedString does seem to
attempt to convert these to spaces; but only on parsing?
The schema can be converted to use a binary type, but then the XML is
un-readable and in-efficient. As most of the data is printable the
occasional escape is more efficient.
I have a full test case, but it is rather brief to just introduce the
xml and schema.
[test.xsd]
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="normTest">
<xs:sequence>
<xs:element name="e1" type="xs:string"/>
</xs:sequence>
<xs:attribute name="a1" type="xs:string" default="0" />
</xs:complexType>
<xs:element name="normTest" type="normTest"/>
</xs:schema>
[test.xml]
<?xml version="1.0" encoding="ASCII" ?>
<normTest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="test.xsd" a1="test
">
<e1>blah test value
	-</e1>
</normTest>
Harness code with the likes of,
normTest copy("file"); //...
copy.e1().append("\t\t\t\t\t\tno\ttabs?");
copy.a1().append("\t\t\t\t\t\tno\ttabs?");
Can generate serialization errors (actually parser errors on the other
side, but it was actually an error to generate invalid XML unless it
is program error not to put non-printables in the std::string?).
Can I force the accessors to 'a1()', 'e1()' to escape strings? Can I
customize just the 'operator<<()' to escape the data? Escaping on
serialization seems the most efficient. Most parsers on the other
side will barf if we include non-printable characters in the
xsd:string space. Using xs:normalizedString would be acceptable, but
it also seems to exhibit the same problems. It just enforce that
input is normalized and not that someone has programatically altered
the xsd_schema::string (aliased to std::string, etc.) with some
non-XML data.
Elements seem to handle this better, but the attributes provide a much
better mapping in C++ space [ie, cardinal 'one'].
Thanks.
Bill Pringlemeir.
More information about the xsd-users
mailing list