Comments on: libstudxml – modern XML API for C++

By: Boris Kolpackov

Boris Kolpackov — Wed, 21 May 2014 20:16:04 +0000

The parser needs the document name for diagnostics (error messages will look like “input.xml:12:23 …”).

As for conversion of “male” to gender, that’s a good question that is answered in the documentation.

By: DeadMG

DeadMG — Wed, 21 May 2014 20:09:55 +0000

Why on earth does the parser need both the filename and the stream? Shouldn’t it only need the stream?

And how do you convert from “male” to gender::male? C++ does not support reflection.

By: Boris Kolpackov

Boris Kolpackov — Wed, 21 May 2014 15:30:43 +0000

Arseny, as I mentioned in my post, I discuss the issue of real XML parsers in my talk in much more detail, so I suggest that you check it out when the video is available. In a nutshell, the argument boils down to this: The intended use of XML is as a data interchange format, not just a data storage format. If your application is the sole producer and consumer of the data, then you might as well choose a more natural and efficient format than XML. So assuming we use XML for data interchange, while your code may not use any of the CDATA’s or DTD’s, it is only a matter of time before someone sends you a perfectly valid XML that your application won’t be able to parse. In fact, most of the “subset” parsers, including pugixml, don’t even document what happens when valid but unsupported XML constructs are encountered. Are they ignored? Is there an error? Crash? Nobody knows. In fact, you don’t even document that your XML parser only supports a subset of XML, which is what I find misleading.

So in my talk I suggested that people don’t corner themselves and instead stick to real XML parsers. There are plenty of conforming and fast implementations out there. And not a single person in the audience raised your “but it works in 99% of use cases” objection.

Regarding the in-memory vs streaming API argument (which is also covered in the talk extensively), most people think they need DOM but I think this is just because of the really bad streaming APIs that were available up to this point. So I tried to convince the audience that streaming is actually sufficient for the majority of cases. Plus, it is easy to go from streaming to in-memory but not the other way around. In fact, libstudxml has the ‘hybrid’ example which shows how to do hybrid, partially streaming/partially in-memory parsing and serialization.

By: Arseny Kapoulkine

Arseny Kapoulkine — Wed, 21 May 2014 15:02:21 +0000

As an author of pugixml, I can’t help but comment on the “real XML parser” thing… Your link is very misleading - you stumbled upon a parser that can’t read content from valid XML at all, CDATA of all things. Great.

Fast/compact XML libraries solve real problems. As long as a parser can read a valid XML disregarding DTD entities, it’s applicable to 99% real-world problems.

If I were to consider libstudxml vs pugixml in a project, I would *not* use this as a disctinction point. A much more important thing is that all three parsers that you specified are DOM, and this parser is pull-based - so depending on the task at hand one is more applicable.