[odb-users] HDF5 support?

Tue Feb 19 03:49:20 EST 2013

Hi Marcus,

Daniels, Marcus G <mdaniels at lanl.gov> writes:

> I'm not aware of anyone that uses HDF5 just as a file format anymore.
> I know of some cases of HDF4 being written by multiple independent
> clients, but HDF5 is a more complex standard and so I think everyone
> uses their libraries.

I didn't mean to say that we need to read the HDF5 file format ourselves.
Rather, I was referring to the file format vs database distinction on a
more conceptual level. I don't think HDF5 will fit into a database model
(where we have such notions as database, connection, transaction, statement,
etc) very well. Instead, I see it as a much simpler API, something along
the lines: "save this object into this HDF5 output stream" and  "load
this object from this HDF5 input stream". This will also fit other "file
formats", such as XML, JSON, etc.

> The kind of parallelism I have in mind would be decomposition of very
> large arrays. That is, suppose a X/Y/Z 3-d space is physically decomposed
> along the Z dimension on large RAID array and a query is made to collect
> a subspace of it. Further suppose that several servers were associated
> with different parts of the Z dimension (each chunk of Z on a server
> and over several hard drives). There would be parallelism collecting
> the data for a partial Z chunk. With HDF5, this would be coordinated
> over different ranks of a MPI process.

Hm, not sure we want to get into all this MPI business with ODB. How is
this handled in HDF5? Is it the HDF5 library that controls processes so
that you basically give it the "memory regions" to fill and it does this
in parallel. Or does it allow you to open several "parallel streams" and
you can then read them from multiple processes that you control?

In other words, if ODB provides basic support for object serialization
to HDF5 streams, will this be sufficient to support parallelism?

Boris