Using an embedded database to simplify device data replication and synchronization - Embedded.com

Using an embedded database to simplify device data replication and synchronization

Anyone with a PDA is familiar with “synchronization”, the act oftransferring information between the handheld and the desktop toguarantee identical data on both devices.

The ability to take all or part of a database's content, move it intoaseparate database, modify either database, and later re-connect the twoand reconcile their differences, opens all sorts of possibilities fortruly intelligent mobile devices. One can imagine scenarios rangingfrom handheld inventory tracking appliances to scientific datacollection instruments.

Synchronization is closelyrelated to “replication“; infact, the terms are often used synonymously. Typically, however,replication refers to creating a clone of all or part of an originaldatabase, while synchronization refers to resolving the differencesbetween two databases, one of which carries data that was replicatedfrom the former.

Regardless of which term one uses, however, the mechanisms forcopying a subset of a database so that the copy can be manipulatedremotely from the original, then re-connected and reconciled in aconsistent fashion, is as tricky as it is powerful. There is a greatdeal of under-the-covers bookkeeping that must occur for a properimplementation.

In this article, we will look as some of that bookkeeping, andbefore we frighten you off with the complexity, we will present anopen-source database that deals with the complexities for you, so youcan employ replication (and synchronization) in your next mobileapplication with relatively little mental pain.

First, however, the travails that you would otherwise have towrestle yourself.

Pitfalls of replication
To successfully implement replication, you must hop a number of hurdlesthat, at first, you might not be aware of. To illustrate, let's assumethat we have a database on a desktop system. We want to copy a subsetof that database's content to a mobile device, carry the deviceelsewhere and modify its database, then reconnect to the “parent”database and synchronize the two.

At this stage, we will not be concerned with the kind of databaseswe're using — whether relational or object-oriented — nor the natureof the data stored in them. We will simply say that our databasescontain “entities”. In a relational database, an entity would probablybe a row in a table; in an object database, an entity would likely bean object. Regardless, our focus at this point will be on the trickyissues surrounding replication.

Tricky issue number one is maintaining a logical connection betweencorresponding entities in both the original and the mobile databases.That is, when we replicate data from the desktop database to the mobiledevice, we need some way of determining that two entities — each in adifferent database — actually represent the same thing.

This logical tether between entities is obviously critical. When were-connect the two databases, if an entity has been modified in themobile database, we have to apply that modification to thecorresponding entity in the original database — which means we have toknow which entity in the original database corresponds to the modifiedentity in the mobile database.

Establishing and maintaining such a connection between two entitiesis not as simple as you might think. Obviously, we need a uniqueidentifier that we can attach to objects; an identifier that isidentical for the original object and its replicated 'twin'. And, whenyou ponder the matter further, you realize that it must be auniversally unique identifier.

Suppose that, after we replicate the master database into the mobiledevice database and disconnect the mobile device, we create a newentity in the original database and a different, new entity in themobile database. We must be guaranteed that the identifiers created forboth new entities are indeed unique (there is no chance that theymatch).

If, by some chance, the identifiers did match, then our databasesare likely to become corrupted when they are re-connected. Thesynchronization software will incorrectly deduce that the two differententities are the same. Who know what sorts of errors will result?

Dealing with multiple mobiledatabases
Furthermore, what happens if the original database is replicated intomultiple mobile databases? The classic example of this arrangement is amaster database of customer information that feeds into salespeople'smobile databases. All the salespeople of the organization replicate asubset of the master database into their mobile devices, then travelout to the field to meet with customers, enter new data, modify data,and so on.

Later, each salesperson re-synchronizes his or her mobile databasewith the home office's master database. No matter how many mobiledatabases might be created from the original, and no matter how manynew entities are added to each mobile database, all distinct entitiesmust have unique identifiers.

Tricky issue number two is determining which entities have beenmodified. Put another way, we need a mechanism that allows us toascertain which entities in the two databases have been altered sincethe mobile database was created from the original.

We could design the synchronization process so that it looks at ALLthe entities in each database, examining each corresponding pair fordifferences. This technique, however, becomes more time consuming asthe number of entities rises. As the database size grew, thesynchronization process would expend more and more unnecessary timeexamining unmodified entities simply to find that they were unchanged.

Dirty flags
One solution would be to associate a “dirty flag” with each entity. Anentity's dirty flag is set whenever that entity is modified, andcleared by the synchronization process. This technique would certainlyquicken synchronization. A dirty flag is, in fact, used by the Palm OSto manage modifications. Every record in a Palm database carries adirty flag that is set whenever the record is modified, and cleared bya subsequent synchronization process.

But, the difficulty of identifying modified entities is complicatedby the fact that updates can occur on both the mobile and the originaldatabase. In that case, a dirty flag may not be sufficient. We may needto identify the time of the modification, so we can deduce which entityis the “older” entity.

Yet, even that — by itself — might not be good enough. The factthat one modification occurs later than another may not be sufficientto qualify the later modification as the one that should prevail in asynchronization.

Ideally, our synchronization code should allow us to specify'conflict resolution' algorithms for determining which entity is the'winner', and therefore overwrites the other. This conflict resolutionprocess could be informed by each object's modification time, and mighteven require user intervention.

Working out a scheme for trackingentities
None of the intricacies of replication and synchronization so fardescribed are beyond the reach of a good programmer and decent code.Concocting a technique for generating a global identifier, associatingthat GUID with entities in thedatabase, and working out a scheme for tracking entities that have beenmodified since the last synchronization — all these are well withinthe capabilities of a moderately good programmer.

However, even a moderately good programmer would probably rather geton with the coding of the actual database application, rather thanspending his or her time designing and coding areplication/synchronization system. Luckily, there is asmall-footprint, open-source database that manages virtually all thedetails so far described.

The database is called db4o, an embeddable object database engineavailable from www.db4objects.com. It isembeddable in the sense that the db4o engine is delivered as a librarythat you link into your application — running in the same processspace as your application, rather than operating in client/serverfashion. (However, there is a client/server variant of db4o, should anapplication require that architecture.) Versions of db4o exist for Java and .NET (also MONO). The examples in thisarticlewill be in Java, but everything done here could also be done in .NET.

While db4o is an object database, it places no practicalrestrictions on the sorts of objects that can be persisted. It willhappily handle simple objects as easily as arrays, collections, andeven complex trees or networks of objects. Nor must the classes ofpersistent objects be specially augmented.

Some object databases require persistent classes to be descendedfrom a persistence-aware parent, or to implement specialpersistence-enabling interfaces. db4o has no such requirements. Thesimplicity of the db4o API is possibly its most powerfulcharacteristic.

For the following discussion, we will create a pair of classes whoseobjects are to be made persistent and, subsequently, replicated. Wewill pretend that we have a customer database, and we wish to replicatethat database into a portable device so that a company employee cantake that device into the field and record payments made by customers.Later, the mobile database will be re-connected to the 'parent'database, and synchronized.

An amended version of the Customer class is shown below:

publicclass Customer {
   String name;
   int accountid;
   ArrayList payments;

   // Constructors
    public Customer(String _name,
        int _accountid)
    {
        this.name = _name;
        this.accountid = _accountid;
        this.payments = new ArrayList();
    }

… rest of class …

We've left out the details of the accessor methods to keep thingssimple. As you can see, each Customer object carries a reference to anArrayList of Payment objects. The Payment class looks like this:

  publicclass Payment {
       long invoicenum;
       long amount; // Payment in cents

       // Constructors
        public Payment(long _invoicenum,
               long _amount)
        {
           this.invoicenum = _invoicenum;
            this.amount =_amount;
        }

       … rest of class …

Again, we've left out the access methods for simplicity's sake.

The intent of our class structure should be clear. Once a Customerobject is entered into the database, information is collectedconcerning that customer's payments, and stored as Payment objects ineach Customer object's payments ArrayList.

So, to demonstrate replication, we'll create a 'master' database,populate it with Customer and invoice information, then replicate thatdatabase into 'mobile' database. We will modify one of the Customerobjects in the mobile database by adding a new Payment object, andsynchronizer the mobile database with the master database. If all goeswell, we will end up with identical master and mobile databases.

The first step, then, is a program that creates and populates themaster database. This code is shown below:

 publicstatic void main(String[] args) {
       Customer customer1;
       Customer customer2;

   // Configure db4o so we can do replication
    Db4o.configure().generateUUIDs(Integer.MAX_VALUE);
   Db4o.configure().generateVersionNumbers(Integer.MAX_VALUE);

   // Create the new database file.
    // Delete it if it exists
    new File(“customer.YAP”).delete();
    ObjectContainer db = Db4o.openFile(“customer.YAP”);

   // Build a couple of Customers
    // Add a couple of payments to the first
    customer1 = new Customer(“Bob”,001);
    customer2 = new Customer(“Bill”,002);

   customer1.addPayment(001, 10000);
    customer1.addPayment(002, 20000);

   // Store both customers in the database
   db.set(customer1);
   db.set(customer2);
   db.commit();
   db.close();
}

The first two calls in this application are to the generateUUIDs() and generateVersionNumbers() methods.Notice that these are methods in the configuration API of the Db4oobject (which represents the db4o database engine). These calls arenecessary because our database is going to support replication.

Recall that we said that, for replication to work correctly, we needto be able to uniquely identify each object, and keep track of eachobject's version number. The call to generate UUIDs() accomplishes the former, and the call to generate VersionNumbers() accomplishesthe latter.

These two methods activate mechanisms internal to db4o so thatunique identifiers and version numbers are automatically generated forus by the db4o database engine.

Once we have configured the db4o engine, we create the database byfirst ensuring that the database file doesn't exist, then by callingthe openFile() method on theDb4o object. This call simultaneously creates the database file, andprovides a reference to the database's associated ObjectContainer.(“ObjectContainer” is db4o parlance for the database itself.)

Next, we create two Customer objects: “Bob” and “Bill”. In addition,we attach a pair of payments to Bob: one for $100, another for $200. Westore those objects into the database with a call to db.set() . Noticethat we didn't have to tell db4o what those objects looked like.

There are no schema files to tell db4o that a Customer objectincludes a reference to an ArrayList , andthe ArrayList must be stored as well as the Customer object. db4ofigures it all out by itself. db4o “spiders” through an object's treeand — unless we tell it otherwise — automatically stores all objectsthat the 'base' object references.

Finally, we call db.commit() .Anytime an operation is performed on a db4o ObjectContainer thatmodifies the database, db4o invisibly starts a transaction. All we haveto do is commit that transaction (which guarantees that the changes tothe database will be permanent, even if the system were to somehowcrash). We should note that db4o also supports rolling back (aborting)a transaction, which we could invoke with a call to db.rollback() .

If we need to verify that our objects have been correctly stored inthe database, we can peek into the customer.YAP file with db4o'sObjectManager. The ObjectManager is a kind of database explorer withwhich we can examine a database's contents, and explore therelationships among objects stored within that database.

The screenshot in Figure 1, below ,shows the ObjectManager opened on the Customer database. In the StoredClasses frame, we can see that the Customer database does include botCustomer objects and Payment objects. Furthermore, in the right-handframe, a tree view of the “Bob” Customer object shows that Bob'spayments ArrayList contains two Payment objects, as it should.

Figure1

With our parent database created, we can now replicate the objectsfrom it into the mobile database. The following code is all we need toaccomplish this:

      // Create a new mobile database
       new File(“customerMobile.YAP”).delete();
       ObjectContainer dbMobile =Db4o.openFile(“customerMobile.YAP”);

       // Open the parent database
        ObjectContainer db =Db4o.openFile(“customer.YAP”);

       // Create a replication session
        ReplicationSession replication =Replication.begin(db,
            dbMobile);

       // Replication is driven by a query
    ObjectSet changed =
           replication.providerA().objectsChangedSinceLastReplication();
    while (changed.hasNext())
           replication.replicate(changed.next());

   replication.commit();

   / / Close everyone
   db.close();
   dbMobile.close();

The code begins by creating the mobile database and opening theoriginal, master' database. The mechanics of replication areencapsulated in the ReplicationSession interface.

We create a ReplicationSession object using the Replication.begin() factory method, specifying the master database first, and the mobiledatabase second. This ordering sets the direction; we are telling thereplication system that objects in the original database are to bereplicated into the new, mobile database.

Replication itself is driven by a kind of query. The call replication.providerA().objectsChangedSinceLastReplication() retrieves from the first replication provider (providerA, which maps todb, the master database) all those objects that have changed since thelast replication. This, of course, will retrieve ALL the objects in theoriginal database. The list of objects is made available in theObjectSet (changed).

At this point, all we need to do is iterate through the ObjectSet,calling replicate() on eachitem returned. It's that simple.

With the mobile database populated, we can modify one of itsobjects. We do so with the following code.

   Db4o.configure().activationDepth(4);
   Db4o.configure().updateDepth(4);

   // Open mobile database
    ObjectContainer dbMobile =Db4o.openFile(“customerMobile.YAP”);

   // Query for Bill
    custTemplate = new Customer();
    custTemplate.setName(“Bill”);
    ObjectSet result = dbMobile.get(custTemplate);
    if(result.hasNext() == false ) {
        System.out.println(“Could notfind Bill”);
        System.exit(0);
    }

   // Load Bill from database
    customer1 = (Customer)result.next();

   // Add a new payment to Bill
    customer1.addPayment(22, 40000);

   // Save Bill back to database
    dbMobile.set(customer1);

   // Commit and close
    dbMobile.commit();
    dbMobile.close();

New in the above code are calls to activationDepth() and updateDepth(), which are applied to the Db4o configuration API. As you've alreadyseen, when we store an object into a db4o database, the database enginealso stores any reachable objects — that is, objects referenced by thebase object being stored.

However, when we fetch an object from the database, db4o does notautomatically fetch all reachable objects; it only fetches reachableobjects up to a given depth — called the 'activation' depth. Likewise,if we update an object in the database (put it back after fetching andmodifying it), db4o only re-stores reachable objects up to a given'update' depth.

Consequently, the calls to activationDepth() and updateDepth() tell db4o how far into the object tree to reach when fetching andupdating objects in the database. Setting both to 4 ensures that whenwe fetch or update a Customer object, we will also fetch and update theassociated Payment objects.

We actually fetch the object associated with customer “Bill” byexecuting what db4o refers to as a query by example (QBE). We do thisby building a template object — custTemplate –and setting the name field to the name we want matched in the database.All other fields are left zero or empty.

We then pass that to db4o's get() method,and db4o will return all objects in the database whose fields match thenon-empty/non-zero fields of the template. In our case, there is onlyone “Bill” object in the database, so we withdraw that object from thereturned ObjectSet collection, add a new payment to Bill's paymentsArrayList, and put Bill back in the database with a set() call. A commit() and a close() , and ourmobile database has been modified.

Finally, we can synchronize the mobile database with the parentdatabase, using the following code:

    //Open mobile database
   ObjectContainer dbMobile = Db4o.openFile(“customerMobile.YAP”);

   // Open the parent database
    ObjectContainer db = Db4o.openFile(“customer.YAP”);

   // Create a replication session
    ReplicationSession replication =Replication.begin(dbMobile,
        db);

   // Replicate back to parent
    ObjectSet changed =
       replication.providerA().objectsChangedSinceLastReplication();
    while (changed.hasNext())
       replication.replicate(changed.next());

   replication.commit();

   // Close everyone
   db.close();
   dbMobile.close();

This code looks virtually identical to the code we used to replicatefrom parent to mobile database. This is because, as far as db4o isconcerned, replication and synchronization are really the same thing.The only notable difference in this piece of code is the order of thearguments in the call to Replication.begin().

Recall that replication “flows” from the first argument to thesecond. So, in the code above, the replication source is now the mobiledatabase (dbMobile ),and the replication destination is the master database (db ).

In Sync
While synchronization/replication can be an involved process if managedat the application level, having the mechanism incorporated directlyinto the database engine simplifies matters significantly.

As we've shown, db4o's replicat

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.