[xsd-users] Large XSD-schema, speed and identity constraint
validation
Boris Kolpackov
boris at codesynthesis.com
Thu May 14 08:43:28 EDT 2020
Stefan de Konink <stefan at konink.de> writes:
> >Also, keep in mind that CodeSynthesis XSD delegates XML Schema validation,
> >including identity constraint validation, to Xerces-C++.
>
> Does this practically mean that if I would only care about XSD-validation,
> there would not be any net benefit to use the XSD toolset, because the
> resulting code is not used to generate a specific parser that is employed
> while doing a XSD validation? I am thinking in the direction of XML Screamer
> research.
Correct. Validation in generated code (also called "perfect parser") works
well for smaller/simpler schemas (which is the reason why we went this way
for XSD/e, our mobile/embedded version). But for schemas we are talking
about (e.g., GML), the size of the generated code becomes impractical
in many cases.
> >>real 17m6.611s
> >>user 17m1.399s
> >>sys 0m3.917s
> >>
> >>real 5m21.199s
> >>user 5m19.587s
> >>sys 0m1.450s
> >
> >I am confused, what are these two results for? Hot vs cold?
>
> Same machine, same data, multiple runs, same code, showing the min and max.
> From my benchmarking background I would consider them both cold. I cannot
> explain (other than hardware reasons, tested it on a laptop Ryzen 2500U) why
> the results give huge outliers for both libxml2 and xerces-c. I cannot
> exclude the initial loading (i/o) of the XSD-schema either.
Do you perhaps have remote (e.g., http://) schema references in (some
of) your schemaLocation attributes? That would explain these results
quite well.
> So I am missing the "Key/Value" report but get an ocean of duplicates where
> I can't find out the reason.
I haven't looked into this in detail but maybe you can resolve the schema
names referenced in the error message back to schema locations based on
the loaded schema grammar.
More information about the xsd-users
mailing list