EXTERNAL: Re: [odb-users] Optimal way of checking if DB entry exists before persisting?

Fri Jun 24 10:53:43 EDT 2011

Boris,

Thank you for the suggestions. I believe I will be in the situation where the object will be in the database more often than it is not. So I will use your suggestion either to use "find" or to maintain a local map since the module I am developing will be the only one persisting that object type.

As for the comment on the commits. I recall some tests I did in the past assessing sqlite speed when directly calling the sqlite api from C. Suppose I was to say commit 50,000 entries. I thought I remember that if I were to commit after binding every entry, it was an order of magnitude slower than if I were to commit after binding all 50,000 entries at first. My memory may not be serving me accurately though ...

In either case, I agree with your counter concept and will consider it if performance becomes an issue (which it hasn't yet).

Thanks again,

Tom

-----Original Message-----
From: Boris Kolpackov [mailto:boris at codesynthesis.com] 
Sent: Friday, June 24, 2011 7:19 AM
To: Szumowski, Thomas
Cc: odb-users at codesynthesis.com
Subject: EXTERNAL: Re: [odb-users] Optimal way of checking if DB entry exists before persisting?

Hi Thomas,

Szumowski, Thomas <thomas.szumowski at lmco.com> writes:

> Suppose I have have an "employee" object that as a member has a pointer
> to an "employer" object as in the one-way relationship example from the
> user guide. I am periodically receiving "employee" objects from an
> external system and would like to persist the employees in one table
> and corresponding employers in another (as in the example).
> 
> If I blindly try to persist the employer (before attempting to persist
> employee), and an object with that employer ID already exists in the 
> database, ODB throws an error stating the object is already persisted.
> As an alternative, upon receipt of an employee object, I can do the
> following:
> 
> - open transaction
> - Get employer ID from received employee object
> - Query the database to see if this employer ID exists
>                 - If it does, then call db->update(..)
>                 - If it doesn't, then call db->persist(..)
> - commit
> 
> This procedure is similar to that outlined at the end of section 2.6 
> from the manual.
> 
> My question are:
>
> (1) Is this the most efficient way of updating/persisting objects that
>     may already exist in the DB?

There are several ways to optimize this:

1. Instead of using the query() function you can use find(). find() is
   quite a bit faster.

2. If you know that in most cases the object will not be in the database,
   then you can do a speculative persist() call. If it fails with the
   object_already_persistent exception, then you fall back to the 
   load()/update() call sequence. This method doesn't work as well for
   the situation where in most cases the object will be in the database
   since you need to call load() (which is equivalent to find()) before
   calling update().

3. If your application is the only entity that persists/updates employer
   objects, then you can maintain a map of such objects that outlive
   individual transactions. This way you can avoid doing find().

> (2) Since the objects come in periodically, is there a more efficient
>     way of persisting each employee to the database? I assume if the
>     data rate is high, doing commits so often may be inefficient and
>     it may be better to commit based off some timer periodically and
>     cache. (Is there a similar feature in ODB?)

This depends on your application. If you have multiple applications
accessing the database or a multi-threaded application, then having
long transactions will hurt concurrency. While it depends on the
database, I am also not certain that having a one long transaction
instead of a series of smaller ones will necessarily lead to better
performance.

In the above scenario, you could combine the transactions that persist
the employee and employer object into one transactions. Also, while ODB
doesn't support some sort of a timer-based auto-commit, you can easily
implement something like this yourself. For example, by counting the
number of objects that were updated/persisted and committing the current
transaction after a certain threshold has been reached.

Boris