[xsd-users] Delay loading

Boris Kolpackov boris at codesynthesis.com
Tue Jun 13 15:25:18 EDT 2006


Hi Andrew,

Andrew Ward <andy.ward at hevday.com> writes:

> Could anyone suggest any strategies to allow the creation of a C++/Tree
> XSD DOM but have the parsing be done on demand, when the actual elements
> are accessed?
> I am working with large XML documents that can take more than 10 seconds
> to load into the DOM but before any user action I typically only access
> a couple of top level attributes.

Hm, there is no such thing in Xerces-C++ DOM parser. Some tricks
could be possible when parsing DOM to the C++/Tree representation,
however. It is therefore important to figure out first where the
majority of time is being spent. There are three main things
happen when one parses XML to the C++/Tree representation: XML
Schema validation, XML to DOM parsing, and DOM to the C++/Tree
representation parsing.

A quick way to measure the time it takes to parse XML to DOM is by
calling one of the internal parsing functions in the XSD runtime:

#include <xsd/cxx/xml/dom/elements.hxx>
#include <xsd/cxx/tree/error-handler.hxx>

xercesc::XMLPlatformUtils::Initialize ();

{
  xsd::cxx::xml::properties<char> prop;
  xsd::cxx::tree::error_handler<char> eh;

  xsd::cxx::xml::dom::parse ("test.xml", eh, prop, false); // Measure this.
}

xercesc::XMLPlatformUtils::Terminate ();

This parses test.xml to DOM without the XML Schema validation (last
argument to parse() is false). You can also measure how much it takes
with the validation.

If most of your time is spent in XML to DOM parsing (without the
validation), then I don't think there is much that can be done about
it except perhaps using a faster parser (like libxml2). If most of the
time is spent in the validation, then you have a number of options:

1. Disable the XML Schema validation altogether. This option is viable
   if you are sure your XML documents are valid (because they were created
   by your application, for instance). Note that the generated code also
   includes a number of checks that will prevent creation of inconsistent
   representations (e.g., missing required elements/attributes, etc.).

2. Pre-loading/Caching/Pre-parsing XML Schema grammar. This is described
   in details in this article:

   http://www-128.ibm.com/developerworks/webservices/library/x-xsdxerc.html


Finally, if you find that the most time is spent in the DOM to the C++/Tree
representation phase, then I will be very surprised ;-). Let me know if
that's the case and I will come up with something.

hth,
-boris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20060613/16cb7e3d/attachment.pgp


More information about the xsd-users mailing list