[studxml-users] Fastest way to count the number of elements with a specific name or at a specific depth?

Sun Sep 11 08:04:10 EDT 2022

Thibault de COINCY <decoincy.thibault at gmail.com> writes:

> What is the fastest way to count the number of elements with a specific
> name or at a specific depth?
>
> I use something like this, but it is very slow:
> 
> int BrandCount = 0;
> xml::parser parser(xml_stream, xml_path, xml::parser::receive_elements);
> for (xml::parser::event_type event (parser.next()); event !=
> xml::parser::eof; event = parser.next()){
> if (event == xml::parser::start_element) {
> if (parser.name = "brand") {++BrandCount;};
> };
> };

Yes, this is more or less how I would do it.

> Is there a way to speed up the process?

Have you tried to build everything with optimization? I can see how
this would be slow in comparison with, say, a substring scan, but
unless you are parsing huge documents, this shouldn't matter much
on modern hardware.

> Currently, the parser goes through every single element, is it possible to
> skip elements that are not "brand"?
> Given I'm not interested in getting the content of the elements but only
> the count, is there a way to optimize the parser somehow?

Generally, it's impossible to "jump" over chunks of unparsed XML due
to its lexical structure. So any conforming parser will have to parse
the entire document. One could probably write a specialized parser
which will not bother with extracting and returning any data for the
uninteresting parts. But general-purpose XML parsers (like libstudxml)
normally assume that everything is of interest.