The proper care and feeding of object databases in embedded systems - Embedded.com

The proper care and feeding of object databases in embedded systems

Truly intelligent embedded systems must not only think, they must alsoknow, remember, and – thereby – learn. The thinking engine is theprocessor, aided in its learning by that part which knows andremembers: persistent storage.

Whenever one considers the persistent storage of complexdata, one assumes that the infrastructure for that storage is some formof relational database. This, atleast, is true of desktop and enterprise systems. And, given thatfruits of technology employed on 'high-end' systems tends to tumbledown into the embedded world, the conclusion that any persistencemechanism for complex data on an embedded system must also be supportedby a relational database seems inescapable.

However, object-oriented languages have not only establishedthemselves as the choice for enterprise and desktop applications, theyhave also made significant inroads in handheld and embeddedapplications. Given the widespread use of object-oriented languages,perhaps its time for embedded developers to consider a persistentstorage infrastructure more in harmony with the application language:the object database management system (ODBMS).

This suggestion might at first appear to be contrary to the receivedwisdom of best practices for embedded software development. Processingpower and memory space are often limited in an embedded system, and anobject database would seem to be the hostile to such environments.After all, a frequent complaint of object-oriented systems -particularly those that are build atop virtual machines – is that theyare processor and memory consumptive. Wouldn't the same hold true foran ODBMS?

However, the ongoing use of object oriented languages and systemshas considerably improved the technology behind object- based systems.While the speed gap between a bytecode-executed application and a native-code application will nevercompletely close, it is narrowing. Similarly, improvements in garbage collection algorithms (as well as programmer understanding of thegarbage collection process) ameliorate — to some degree –memory concerns.

There are significant benefits to be had from choosing an objectdatabase over a relational alternative, benefits this article hopes tospotlight. To illustrate the points we'll be making, we have chosen theopen-source object database db4o as our archetype. Db4o can bedownloaded from www.db4objects.com, and isavailable in Java andC# versions. (Our examples willuse Java.)

Performance and Memory Footprint
Two major concerns were alluded to at the outset of this article:processor horsepower and memory real estate needed to support an ODBMSmight make such a database engine unusable.

Not necessarily.

Db4o's execution time competes well with a relational database”back-end”. In a recent paper (Comparing thePerformance of Object Databases and ORMs) published by theDepartment of Computer Science at the University of Pretoria, db4o wastested against a popular open-source object/relational database, andfound to be the faster of the two — over 40% faster in some cases. Theauthors ( Pietre Van Zyl et al )concluded that the object/relational database was superior only in”isolated cases.”  Db4o's memory use is also favorable; the db4olibrary's memory consumption is about 400K. When executing, the libraryrequires about a megabyte total to support the engine and itsactivities. In addition, db4o's API provides methods for tuning memoryresource consumption.

In less visible, but still practical terms, db4o — and any ODBMS,for that matter — requires noSQL interpreter orexecution engine. Meanwhile, if you choose a relational database as theback-end storage for your application, you will almost certainly bewriting some SQL code. Your application will have to “step into” SQL tohandle the database logic; and the SQL you will write is typicallyencoded in strings. The strings must be parsed and executed by somekind of SQL engine.

There are two immediate results to this. First, processor cycleswill be consumed by the interpreter at runtime. SQL code cannot beexecuted directly it must be parsed and interpreted. Second (and lessobvious) is the fact that the parsing at runtime allows syntacticerrors to slip past the compiler. Something as simple as a misspelledword in an SQL statement could find its way into the executable. Thesubsequent failure would require debugging time, re-compilation time,and so on.

A less immediate result is the overhead injected into theapplication by the fact that the inclusion of an RDBMS into anobject-oriented application places two paradigms under one roof. Dataflowing between these two paradigms must be translated from one to theother as it crosses the invisible boundary between. That translationrequires code, and that code eats memory and processor cycles.

This overhead becomes apparent if we compare the code required toread an object from a relational database, to similar code required forthe object database db4o. Let's assume that we have a reasonably simpleclass called Datapoint, and we want to read objects of that class fromthe database. In fact, we want to read ALL of the objects of that classfrom the database.

// Represents adatapoint
public class Datapoint {
   public int deviceID;
    public int sensorID;
    public java.sql.Timestamp readingTime;
    public float value;

   … datapoint's methods … }
Listing 1. The Datapoint class.

In the Listing 1 above , wehave omitted Datapoint's methods, and defined Datapoint's members aspublic to keep things simple.

Reading the objects from a relational database would look somethinglike the code in Listing 2. below .In this code, we will assume that the developer has chosen to store asingle class per table. (In thisinstance, all Datapoint objects are stored in table DATAPOINT .)We will also assume that an open connection has been made, and isrepresented by the connection object.
StatementqueryStatement = connection.createStatement(); ResultSet rset =queryStatement.executeQuery(    “SELECT DEVICEID,SENSORID, TIMESTAMP, VALUE FROM DATAPOINT”);

while(rset.next())
{
        Datapoint tDatapoint = newDatapoint();
        tDatapoint.deviceID =rset.getString(“DEVICEID”);
        tDatapoint.sensorID =rset.getString(“SENSORID”);
        tDatapoint.readingTime =rset.getTimestamp(“TIMESTAMP”);
        tDatapoint.value =rset.getFloat(“VALUE”);

      … do somethingwith tDatapoint …

}

Listing 2. Fetching Datapointobjects from a relational database.

This code fetches an object from the database by actually fetching arow from a table. Then an “empty” object is instantiated, and thefields are copied from the row, and into the object's members. Once,that's done, the tDatapoint object can be manipulated by the application.

Equivalent code in db4o appears in Listing3, below . Here, the object db is a handleto the database (an ObjectContainer,in db4o parlance), and corresponds to the JDBC connection objectin Listing 2.

DatapointtempDatapoint = new Datapoint();
ObjectSet rset = db.get(tempDatapoint);
while (rset.HasNext())
{
        Datapoint tDatapoint =(Datapoint)rset.Next();

       … dosomething with tDatapoint…
}

Listing 3. Fetching Datapointobjects from an object database.

In Listing 3 , we'veemployed db4o's simplest query technique: query by example(QBE). QBE uses a 'template' object to determine which objectsare retrieved by the query. Since we have created an empty Datapointobject to use as the etemplate, this has the effect of fetching allDatapoint objects from the database.

Notice that, with the object database, no query string is needed;the object is simply called forth out of the ObjectSet iterator.More importantly, the object is fetched “wholesale” — fullyinstantiated, and fields populated — so the additional code in Listing 2 that must copy data fromthe rset object to the tDatapoint object is unnecessary. Hence, the corresponding object database code isshorter than the relational database code.

And the object database's advantage shown above holds regardless ofthe complexity of the object. If the object fetched were the root of acomplex collection — a binary tree, say — and we had set the “fetchdepth” (referred to as the “activation depth” in db4o-speak) to accountfor the tallest possible tree in our database, a single call would havefetched the entire tree.

BinTreeRoottempBTRoot = new BinTreeRoot();
ObjectSet rset = db.get(tempBTRoot);
while (rset.HasNext())
{
        BinTreeRoot BTRoot =(BinTreeRoot)rset.Next();

       … dosomething with BTRoot…
}
Listing 4. Fetching a binary tree froman object database.

The snippet in Listing 4 above fetches all the binary tree root objects (members of the BinTreeRoot class)from the database. And, assuming we have set the activation depthappropriates, also fetches the entire tree (all the nodes that the BinTreeRoot objectreferences).

Imagine, now, trying to do that with a relational database. Fetchingand instantiating a binary tree would have required iterating through a series of SELECTstatements — each pulling in a single node of the tree — guided bysome form of tree traversal algorithm, and converting the dataretrieved from the ResultSet into the binary tree object members. The tree would have to be “wiredtogether” explicitly in the application code.

Code of equal complexity would have to be constructed to store thetree into the database. Code would have to traverse the tree's memberobjects. Meanwhile, given that the root of the tree is in the objectBTRoot, the equivalent code in db4o (for storing the entiretree) wouldbe:

db.Set(BTRoot);

Listing 5. Writing a complexobject in a db4o database .

The additional code required to fetch and store objects in arelational database arises from the oft-cited “impedance mismatch”between the relational and object paradigms. This additional code isabsent from the db4o applications.

Intelligent API
Of course, a database library cannot simply boast a small memoryfootprint. The library's API must be well-chosen and efficientlyimplemented, so that functionality is not sacrificed on the altar ofmemory economy. The API should be “as simple as possible, but nosimpler”, to borrow a famous adage.

Db4o's API is surprisingly compact. In many cases, only a handful ofmethods are needed to perform the majority of database operations. Oncea db4o ObjectContainer is opened, the following methods handle thefunctions of adding, updating, deleting, and searching:

1) Set(object) adds a new object to the database, or updates anexistingobject.
2) Delete(object) deletes an existing object from the database.
3) Get(templateObject) fetches objects from the database.

The above methods assume that the application is using db4o's QBEquerying mechanism. db4o has two other query techniques, each suited toa different circumstances. We won't go into the details of those othermechanisms here, but suffice it to say that they cover even the mostcomplex querying requirements.

Reduction of Complexity
The use of an ODBMS like db4o reduces the complexity of the finalapplication in other not so obvious ways. For example, the db4odatabase library is housed in the same process space as theapplication. This allows the library to manipulate application objectsdirectly, and eliminates any marshalling code required to pass data toa database engine executing in another process. (It also eliminates the memory space andprocessor cycles that would otherwise be consumed by inter-processcommunications used to 'connect' the application and the database engine.)

Admittedly, this is neither a requirement nor a mandatorycharacteristic of an ODBMs; many object database systems (even db4o, in fact) can operate inclient/server fashion, dividing application logic from database logic.Nor is a single code-space architecture impossible for an RDBMs-basedsystem. However, as a relational database uses a decidedly differentrepresentation of data than the form that data takes in theapplication's objects, a sort of bicameral structure is natural, withthe SQL engine on the other side of an imaginary divide between it andthe application. Many relational database systems separate theapplication and database into different processes.

Another easily-overlooked benefit is the fact that db4o keeps itsdatabase in a single file. In the world of enterprise applications,where disk storage is measured in hundreds of gigabytes and files arenumbered in the thousands, a one-file database is no advantage at all.But, it's a different story on an embedded system with limitedfilesystem resources.

A single-file database reduces “clutter” on the destination device.In addition, the database is more easily installed, backed-up, orcopied, because it's all in one place. Put simply, there are fewerpieces to keep track of. By contrast, some RDBMS systems create asubdirectory for each database, and store individual tables in separatefiles. This is another effect of the relational paradigm; each tablestores well-defined rows, so it makes sense to separate tables in thefilesystem.

Zero Administration
The actual computer driving many embedded systems is hidden from theuser. There may be no mouse, a keypad instead of a keyboard, and LEDsinstead of an SVGA display. User interaction with the system isstrictly limited to the system's function. Consequently, the systemmust operate with zero administration.

You might say that we would prefer the database in an embeddedapplication to simply “be there, and do what it's told.”

From a developer's perspective, we would rather not have write anycode that involves “describing” the structure of our data for thebenefit of the database. For a relational — or, more likely, anobject-relational database — such coding would take the form of a”schemafile”, that expresses the structure of our data in some formal language (sometimes aproprietary language, sometimes XML).

This schema file would be read by an interpreter to create the data definition language (DLL)code that builds the database to begin with. The interpreter might alsocreate the interface code that reads and write database objects. (Such code would be the equivalent of whatwe did by hand in Listing 2, earlier .)

We would prefer — again, from a developer's perspective — tosimply put an object in the database without having to tell thedatabase what the object looks like. With an ODBMs like db4o, we don'thave to build any schema files, because — in a real sense — wealready have. The class definitions in the source code is the databaseschema.

To put an object in the database … you just put the object in thedatabase. As a result, we don't have to resort to anything like SQL'sDDL (data definition language) to define the architecture of persistentstorage. There is no “initialization” code that we need to write thatconstructs tables, defines columns, supports relationships, and so on.

Change Tolerance
Closely tied to the concept of zero administration is “changetolerance.” A “change tolerant” database is one that easily managesalterations in the structure of the persistent data it stores. Forexample, if we modify an application so that an additional class ismade persistent, we would prefer that the database need no alterationsto accomodate the change.

An ODBMS, such as db4o, will accept a new class of objects easily… transparently, in fact. Suppose an embedded application using db4ohas stored only objects of class A in the database. For whateverreason, a time arrives at which the application must begin storingobjects of class B in the database. What changes have to be made to thedatabase? None; the application simply begins storing B objects, andthe db4o database engine takes care of all the behind-the scenes work.

Contrast this with an RDBMS as the back-end database. A change inthe kinds of objects stored in the database would likely necessitatethe creation of a new table (that, inturn, means that DDL code must be written and executed to create thenew table ).

What about a change in an existing class structure? Suppose, forexample, that a later version of a given application modifies a classby adding a new data member. Objects instantiated from the 'new' classwill possess an additional data element, as compared to those objectsof the same class already in the database.

An RDBMS-based application will have to either create a new table,and translate the old into the new, or modify the existing table, andfill the new fields (of the 'old'objects) with default values. Ineither case, the application's developers much construct code — bothSQL and application code — to manage the upgrade.

With an ODBMS like db4o, changing an object's structure requireslittle or no database-specific code. 'Old' and 'new' objects of thesame class can coexist in the same database. When an 'old' object isfetched, db4o instantiates the the object into the 'new' class andfills in the missing fields with default values (zeros for numeric, byte, and char data;empty arrays for arrays; and nulls for everything else).

Writing such an object back to the database causes that object tobecome a 'new' version. Over time, then old objects are silentlytransformed to new objects. Hence, the database can keep pace withevolving object structures invisibly and — because the database neednot be reconstructed wholesale in response to a class structure change– upgrading deployed systems is easier.

For more complicated object evolutions, db4o provides callbackmethods that allow application code to intercept objects of specifiedclasses to and from the database. So, the callback method can identifyold objects, and populate the new data members with values other thanthe default. In addition, because the alteration is made in a callback,it is isolated from the remainder of the application code, yielding amore readable (and maintainable )application.

Efficient Use of Persistent Storage
If data in the database is going to see a lot of turnover, the databasemust manage deleted space reclamation. Historically, this has been anarea of weakness for object databases, given that a single database maystore objects of different size and structure. Meanwhile, a relationaldatabase system has the advantage that every row in a table iscomprised of the same kinds of columns. (Sometimes, a table's rowsmight even be of fixed length.)

Db4o is closing the gap on this advantage that an RDBMS has over anODBMS. Currently, db4o does reuse deleted object space, but a 4-byte'leak' occurs each time an object is deleted. Db4o's developers arecurrently working on an upcoming version that should eliminate thisleak.

In addition, db4o provides (separatefrom the database library ) thesource for a defragmentation class. This source can be woven into yourcode so that you can defragment the database at a time when doing sodoes not affect the embedded application's activities.

Other Considerations
Other aspects of db4o make it worth consideration in an embeddedapplication. For example, db4o provides built-in synchronization(referred to as replication”in db4o documentation). This feature is EXTREMELY useful for embeddedapplications running on 'remote' devices, whose data must beperiodically exchanged with some central database. In fact, for suchapplications, this capability simply MUST be present … unless thedeveloper wants to copy the database over wholesale, and resolvedifferences on the destination.

You enable replication on a db4o database when you create thedatabase. Objects added in the database are given UUIDs,and each time a persistent object is modified, a transaction number isassociated with the object. Suppose now that a separate database iscreated (with replication enabled ),and objects from the first arereplicated into the second.

When the objects are moved into the new database, their UUIDs andtransaction numbers follow them. Later, when the databases aresynchronized (the objects in the second database are reconciled tothose in the first) db4o's replication code can — via the UUIDs andtransaction numbers — which objects have been modified, and which havenot.

For each modified object, db4o calls a conflict resolution callbackroutine in your code, which examines the two objects in question anddetermines which object is the 'winner'. The process is conceptuallystraightforward, and having the bulk of the work done for you by thedatabase engine makes implementing synchronization quick and easy.

Rick Grehan is a QA EngineerforCompuware's NuMega Labs in Merrimack, NH. He has been programming fornearly 30 years, and has written software in languages ranging fromForth to Fortran, 8-bit BASIC to Java,and 6502 assembly language to PHP. He is also a freelance writer. Hisarticles have appeared in BYTEMagazine, JavaPro, Linux Journal, The Microprocessor Report, EmbeddedSystems Journal, and others. He hasalso co-authored three books; one on RPCs, another on embedded systems,and a third on object databases inJava. He can be contacted atregrehan@hotmail.com.

Embedded Database resources onEmbedded.com

1) EnsuringDatabase Quality
2) Designingdata-centric software
3) Providingreal-time embedded to enterprise connectivity with DDS and DBMS
4) XML,SQL, and C
5) Buildinga effective real-time distributed publish-subscribe framework
6) Tacklingmemory allocation in multicore andmultithreaded applications
7) Designingdata-centric software
8) Reducecomplexity of network systems development with data-centric software
9) Telematicsoftware needs data-centric development approaches

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.