[xsd-users] Eliminate meaningless whitespace in text elements?

Boris Kolpackov boris at codesynthesis.com
Wed Mar 3 06:18:43 EST 2010


Hi Ian,

Durkan, Ian <ian.durkan at progeny.net> writes:

> In our schema, there's an element "description" that's type string.
> During parsing, is there any way to have xsd/tree automatically
> eliminate leading, trailing, and meaningless internal whitespace?  For
> example, the element might be set up like this in a document:
> 
> <description>
> 
>     This description
> 
>     covers multiple lines.
> 
> </description>
> 
> When calling the description() method of the element's parent, it
> returns a string containing the spaces and newlines verbatim from the
> document.

That's the expected behavior since the semantics of the xsd:string
type require that it preserves all the whitespaces.


> >From what I'm reading across the web, this is impossible for xerces-c
> alone-the schema would need to set description's type to "token" instead
> of "string".

Correct, there is no automatic way to get the required behavior in 
Xerces-C++ except by changing the schema. You could also "post-process"
the DOM document manually and collapse all the whitespaces in the
specific element(s) before handing it off to C++/Tree.


> Can xsd/tree take care of this, or does it come down to choosing between 
> dropping whitespace ourselves, or changing the schema?

Unfortunately, there is no automatic way to do this in C++/Tree either.
There are a couple of "manual" options available on this level:

1. You can assign the value returned by description() to xml_schema::token
   which will collapse all the whitespaces:

   xml_schema::token t = x.description ();

2. You can customize the type containing the description element and
   "override" the description() accessor to return the collapsed value.
   Let's say the type containing the description element is called X.
   Then you request the generation of the original mapping for X as
   X_base (see the --custom-type option) and then provide the custom
   implementation as:

   class X: public X_base
   {
     // Copy c-tors and _clone from X_base and simply forward the
     // arguments to X_Base.

   public:
     const description_type& 
     description () const
     {
       xml_schema::token t = X_Base::description ();
       return description_type (t);
     }
   };

   For more information on type customization see the C++/Tree Mapping 
   Customization Guide:

   http://wiki.codesynthesis.com/Tree/Customization_guide

   As well as the examples in the examples/cxx/tree/custom/ directory.

3. Customize the mapping for the xsd:string XML Schema built-in type to 
   always collapse whitespeces. This is a pervasive change in that every
   element and attribute that uses this type will have whitespaces
   collapsed. This method is primarily useful when you have a lot of
   elements like description and all the other elements/attributes that
   use xsd:string are whitespace-insensitive.

Boris



More information about the xsd-users mailing list