[odb-users] Support for Bulk Copy

Thu Nov 3 06:06:25 EDT 2011

Hi Ken,

Ken Feng <kfmfe04 at gmail.com> writes:

> In my current code, I use Mysql's LOAD DATA INFILE to bulk copy
> thousands of rows in and out of the database.
> 
> How would I translate this code into odb?

I would first load the file into a list of C++ objects. Then split
all the objects into chunks (you can play with the chunk size to
find the optimal value). Then start a transaction and use the
normal persist() function to store objects in each chunk. Once
the whole chunk is handled, I would commit the transaction and
repeat with the next chunk.

It would actually be interesting to compare the performance numbers
of the two approaches. The LOAD DATA INFILE has the advantage of
loading data directly into InnoDB without any SQL. But if the input
file is very large, the undo log can grow uncontrollably, which will
slow things down considerably, as discussed in this article:

http://www.mysqlperformanceblog.com/2008/07/03/how-to-load-large-files-safely-into-innodb-with-load-data-infile/

That's the reason why in the above approach I suggested chunking the
object list -- at the end of each transaction the undo log will be
cleared.

On the other hand, the persist() approach does use SQL but it also
uses prepared statements, so SQL parsing should not be an issue.
ODB also sends all the data in binary form, so there is no need to
convert from string representation (or, at least, such conversion can
be done on the client side, as opposed to the database server). In
contrast, LOAD DATA INFILE will do the conversion on the database
server. So as long as you have a fast connection between the client
and server (e.g., local UNIX socket or gigabit ethernet), the data
path in the ODB approach shouldn't be a problem either.

The other thing that you can do to speed up the ODB transaction, is
to temporarily disable foreign and unique key checks:

using namespace odb::core;

database& db = ...
connection_ptr c (db.connection ());

c->execute ("SET FOREIGN_KEY_CHECKS = 0");
c->execute ("SET UNIQUE_CHECKS = 0");

for (/*chunk loop*/
{
  transaction t (c->begin ());

  ...  

  t.commit ();
}

c->execute ("SET UNIQUE_CHECKS = 1");
c->execute ("SET FOREIGN_KEY_CHECKS = 1");

If you use replication, then disabling binary logging may also be a
good idea (SET SQL_LOG_BIN = 0;)

Boris