[xsd-users] Eliminate meaningless whitespace in text elements?

Durkan, Ian ian.durkan at progeny.net
Tue Mar 2 18:50:20 EST 2010


Greetings,

 

We're being challenged by an issue with whitespjace within text nodes in
documents we're parsing using the combination of the xerces-c DOM parser
(specifically a DOMLSParser - to allow reuse of schema for validation)
followed by xsd/tree processing.  Put simply we get the DOMDocument from
the DOMLSParser, then hand it off to xsd/tree.  

 

In our schema, there's an element "description" that's type string.
During parsing, is there any way to have xsd/tree automatically
eliminate leading, trailing, and meaningless internal whitespace?  For
example, the element might be set up like this in a document:

 

<description>

    This description

    covers multiple lines.

</description>

 

When calling the description() method of the element's parent, it
returns a string containing the spaces and newlines verbatim from the
document.

 

>From what I'm reading across the web, this is impossible for xerces-c
alone-the schema would need to set description's type to "token" instead
of "string".  Can xsd/tree take care of this, or does it come down to
choosing between dropping whitespace ourselves, or changing the schema?

 

Ian Durkan

ian . durkan  <at> progeny . net

 



More information about the xsd-users mailing list