[xsd-users] Eliminate meaningless whitespace in text elements?
Boris Kolpackov
boris at codesynthesis.com
Wed Mar 3 06:18:43 EST 2010
Hi Ian,
Durkan, Ian <ian.durkan at progeny.net> writes:
> In our schema, there's an element "description" that's type string.
> During parsing, is there any way to have xsd/tree automatically
> eliminate leading, trailing, and meaningless internal whitespace? For
> example, the element might be set up like this in a document:
>
> <description>
>
> This description
>
> covers multiple lines.
>
> </description>
>
> When calling the description() method of the element's parent, it
> returns a string containing the spaces and newlines verbatim from the
> document.
That's the expected behavior since the semantics of the xsd:string
type require that it preserves all the whitespaces.
> >From what I'm reading across the web, this is impossible for xerces-c
> alone-the schema would need to set description's type to "token" instead
> of "string".
Correct, there is no automatic way to get the required behavior in
Xerces-C++ except by changing the schema. You could also "post-process"
the DOM document manually and collapse all the whitespaces in the
specific element(s) before handing it off to C++/Tree.
> Can xsd/tree take care of this, or does it come down to choosing between
> dropping whitespace ourselves, or changing the schema?
Unfortunately, there is no automatic way to do this in C++/Tree either.
There are a couple of "manual" options available on this level:
1. You can assign the value returned by description() to xml_schema::token
which will collapse all the whitespaces:
xml_schema::token t = x.description ();
2. You can customize the type containing the description element and
"override" the description() accessor to return the collapsed value.
Let's say the type containing the description element is called X.
Then you request the generation of the original mapping for X as
X_base (see the --custom-type option) and then provide the custom
implementation as:
class X: public X_base
{
// Copy c-tors and _clone from X_base and simply forward the
// arguments to X_Base.
public:
const description_type&
description () const
{
xml_schema::token t = X_Base::description ();
return description_type (t);
}
};
For more information on type customization see the C++/Tree Mapping
Customization Guide:
http://wiki.codesynthesis.com/Tree/Customization_guide
As well as the examples in the examples/cxx/tree/custom/ directory.
3. Customize the mapping for the xsd:string XML Schema built-in type to
always collapse whitespeces. This is a pervasive change in that every
element and attribute that uses this type will have whitespaces
collapsed. This method is primarily useful when you have a lot of
elements like description and all the other elements/attributes that
use xsd:string are whitespace-insensitive.
Boris
More information about the xsd-users
mailing list