Local Device Search - The next embedded consumer killer app? - Embedded.com

Local Device Search – The next embedded consumer killer app?

Search, especially in small foot printed embedded devices in theconsumer market, is non-trivial to implement, especially if it isapplied to more than just alphanumerics. Savvy embedded developers arestarting to take advantage of a new breed of database managementsystems designed specifically for devices.

They fit in a small footprint at run time and offer not just a highlevel query language, but also sharedaccess to data, and guaranteed database consistency even afterunexpected power outages.

Such features are becoming important in many embedded consumer appsbecause a number of converging trends act together to constantlymultiply the amount of content.

Exploding mobile content
In the movies space, the advent of cheap digital film cameras andediting equipment has led to an explosion in amateur video. YouTube isthe definitive example. Following the general decline in people'sattention spans, a “typical” video program is rapidly shifting from thestudio-generated 25- or 50-minute program to the amateur-produced 2- or3-minute clip.

Instead of content coming only from a handful of studios, videos arebeing produced by tens of thousands of individuals and small groups.The result: millions of pieces of video content are being created anddistributed every year.

Similar trends in low cost development and distribution are drivingan explosion in still photography and musical content. Even the volumeof commercially-produced broadcast content is exploding. Satellite andcable packages now may contain hundreds of channels. IPTV promises to increase thiseven further.

Why should an embedded developer care about this? Surely Google hasthe search problem under control? Well, yes, but only if you aresearching content on the Web, and only if you have an Internetconnection, which can't always be assumed.

Increasingly, content is being downloaded to handheld portabledevices. These devices are now storing so much data that finding thecontent that the user wants is becoming a challenge. Here's asubversive thought: Was the success of the iPod Shuffle based on thefact that actually finding music on an MP3 player is a bit of a chore,so it is easier to just randomize it?

Integrating the Data
In one sense the MP3 player presents a simple search challenge: thetask is to search a single collection of data to find the target.Looming on the horizon like a threatening storm cloud is theinconvenient fact that devices increasingly store many kinds ofinformation, and the real value comes from integrating all that data.

Take the case of a mobile device carried by a field supportengineer. It needs to integrate customer data, product data, servicedata, location data, and inventory data to be able to answer an obviousquestion like this one: “Find me a nearby customer with an open serviceorder on a product that I am certified to maintain and where I alreadyhave the likely parts in my truck.”

Moore's Law to the Rescue
Of course the increases in device storage come along with increases inRAM capacity and processing power, driven by the so far inexorableMoore's law that predicts a doubling in compute power roughly every 18months. This increased power can be used to drive increasinglysophisticated search, and users of both consumer and commercial devicesare going to want it. The question is how to best provide it.

The obvious way is just to code search into the device application.But this is not as simple a solution as it seems, and nor is itnecessarily the best use of scarce development resources. This articletakes the position that the optimal way to provide search is to embedinto the application a COTS (Commercial Off The Shelf) relationaldatabase management system (RDBMS) that is optimized for thedevice environment.

The reason is straightforward: searching across multiple sets ofshared updateable data is hard to do well. Training in device softwaredevelopment is not necessarily a good background for writing a databasemanager. While it is reasonably easy to write a search solution for agiven requirements, it is very hard to write an efficient, compact,general purpose data manager.

Even if a very talented development team could accomplish this task,why would a manger want to spend their resources on invisibleinfrastructure instead of focusing on adding value within the team'score competency? Just as in almost all cases embedded developers useCOTS operating systems rather than writing their own, so the time hascome to use COTS data management rather than writing code for thisfunction.

Desirable Database Features:Choosing a DBMS
There are three fundamental kinds of data manager available to thedevice developer: data management libraries, object database managementsystems (ODBMS), and relational databasemanagement systems (RDBMS). It is important to choose the right toolfor the task at hand.

A data management library is useful for the storage and managementof simple data sets. An application that saves the preferences ofseveral users could benefit from this approach. An example could be thecode that manages seat and mirror positioning in a passenger vehicle,retaining the settings for several drivers. The application is simple,the data is simple and the application can be written faster using asimple data management library.

An ODBMS is designed to provide more or less transparent persistencefor application objects. An ODBMS can make object-oriented programmingin a language like Java much simpler because it takes care of movingobjects between persistent storage and RAM. While some ODBMS providelimited search capability, this is not their strength. Objects arenormally retrieved because the application knows which objects itwants.

An RDBMS, on the other hand, is designed for content-based search.RDBMS are based on SQL (Structured QueryLanguage), which is a set-oriented language that provides that abilityto retrieve a record based on the value of any of its fields. Thismakes it the perfect choice for a device search application.

The Enterprise Database Grows Down
Historically, RDBMS have been the data manager of choice for enterpriseapplications, and they were designed for the enterprise data centerenvironment. They demand large powerful machines, and frequentattention from database administrators.

Fortunately a new class of self-managing RDBMS is appearing on themarket with much smaller footprint than their enterprise-classpredecessors. Focusing on the subset of SQL most suited to deviceapplications and sometimes offering advanced search for devicedatatypes like text and spatial, an embedded RDBMS will fit into aMegabyte of RAM or even less at run time.

For the first time, it is possible to think of embedding arelational database management system into a device application, andthere are compelling time-to-market considerations that encourageembedded software developers to do just that.

Let's look at what a relational database management system has tooffer.

Content-Based Search
The SQL language provides asimple search interface to data. In SQL, the application finds data bymeans of its content, not its location. An RDBMS stores data in tablesmade up of rows and columns. Rows are retrieved because the content ofone or more columns matches the values in the query.

For example, information about music albums could be stored in atable like this.

In SQL, you would create this table with a statement like this:

CREATE TABLE Albums (
       Album_name            VARCHAR(254),
       Album_artist            VARCHAR(254),
       Album_label             VARCHAR(254),
       Album_year              SMALLINT)

You could find all albums by a given band using a query like this:

SELECT Album_name
FROM Albums
WHERE Album_artist = “JeffersonAirplane”

To make this query execute quickly, you could create an index on theAlbum_artist field:

CREATE INDEX Album_artist onAlbums(Album_artist)

Once the index is created, the RDBMS maintains it and will use itautomatically to speed searches on that field.

Integrating Data
An RDBMS enables you to integrate data from many tables using a join. Ajoin connects columns of two or more tables using matching columnvalues. This is useful for a couple of reasons.

One is that it enables cross-reference data stored in differenttables, perhaps by different applications. Second is that you canreduce the storage requirements by storing each piece of data only onceand then cross-referencing it where it is needed.

In the example table above, we can see that the artist name isstored many times, once for each of their albums. We can eliminate thisredundancy by separating the artists out into a table of their own. Weallocate each artist an arbitrary id so that we can perform thecross-reference:

To do this, you would execute SQL like this:

CREATE TABLE Albums (
        Album_name VARCHAR(254),
        Album_artist SMALLINT,
        Album_label VARCHAR(254),
        Album_year SMALLINT)

CREATE TABLE Artists (
       Artist_name VARCHAR(254),
       Artist_id SMALLINT),

To find all the albums by a given artist, you join the tablestogether, using the artist_id:

SELECT Album_name
FROM Albums, Artists
WHERE Albums.Album_artist =Artists.Artist_id
AND Artist_name = “Jefferson Airplane”

This powerful technique enables an application to cross-referenceany data stored in a database, merely by specifying that field contentsshould match.

Transactions
Central to maintaining data integrity is the idea of transactions. Inan RDBMS, a transaction is a collection of statements that eitherexecute completely or not at all. The classical transaction is atransfer between a checking and savings account. Both the debit and thecredit should happen, or neither should happen. There should never be acondition when only one or the other has taken place.

An RDBMS provides simple semantics to signal the beginning of atransaction and to either commit the set of actions or roll them back.The RDBMS makes four guarantees about transactions; these are known asthe ACID properties : a transactionis Atomic, Consistent, Isolated and Durable.

Atomic means that the transaction succeeds or fails as a unit.Consistent refers to the fact that a transaction may not violatedatabase integrity rules: if the checking account debit would result ina negative balance, and negative balances are illegal, then thetransaction will not take place. Transactions are Isolated so thatother applications cannot get an inconsistent view of the data byseeing partial results midway through transaction execution, and theyare Durable because they survive power fail and reboot.

Even in a single-user, or single application environment,transactions are useful in protecting database integrity from errorscaused by such things as an unexpected loss of power during a sequenceof actions or a media error. But they are essential in the more complexenvironment of many modern devices in which the data is shared by manyapplications.

Time to Market
Maintaining data structures that reliably support efficient andcontrolled access to shared data is a complicated business. Databasemanagement systems are very sophisticated pieces of software built byengineers who specialize in this arcane branch of computing science.

Now that small footprint, self-managing RDBMS are available, itmakes a lot more sense to embed an RDBMS in an application than tobuild data management logic from scratch. Once developers are relievedof the need to attend to the fine points of data management, they canfocus on delivering the features that win customers. The result is amore robust, richer application delivered to market faster. And intoday's competitive markets there are few second prizes. Getting tomarket fast is often the key to commercial success.

Embedded Application Optimizations
The enterprise RDBMS was created to run back office business, and so itis heavily focused on support for alphanumeric data. But embeddedapplications often have the need to deal with text search and spatialsearch. Some modern embedded RDBMS provide extensions to support thesedatatypes, bringing to them the same high level query interface thatstandard SQL provides for alphanumerics.

Each of these datatypes requires a new kind of index. Alphanumericsare scalar data: they can be distributed along a line. The B-treeindexing used by enterprise RDBMS is in effect a way to do a binarysearch along this line. Spatial data is 2- or 3-dimensional and cannotbe efficiently searched using B-trees. Thus, a B-tree powered DBMScannot answer a simple question like “Find me the points of interestthat lie within this circle.” The spatial search engines that powersites like Yahoo Maps use special purpose search algorithms, notenterprise RDBMS.

Some embedded RDBMS now provide Quad Tree indexing that allowsefficient search of spatial data. Because of the ability of an RDBMS tointegrate data from multiple tables, direct support for spatial datawithin an RDBMS enables an application to treat geography as justanother source of information that can be joined to other data withinthe database. It becomes easy to ask questions like, “Show me the namesof people who have sent me a text message recently and who are in thisshopping mall.”

With more mobile devices becoming location aware, either through GPSor some other location technology, new opportunities are arising todeliver services that leverage knowledge of the location of the deviceand its surrounding environment. These applications may even be able tooperate when the device is disconnected from the network, supportingemergency services, or field workers in distant locations.

An unusual application of spatial search is to use it to locatemedia content. With devices like MP3 players and Personal VideoRecorders (PVRs) storing thousands of items of content, the classicalfolder-based interface breaks down.

Users need a more intuitive way to find desirable content. One keyis quantitative tagging. Content is tagged (by the user, or the contentprovider, or a community of users) using a number of quantitativemetrics.

In the case of movies, this could be the complexity of the plot, orhow scary the movie is, what ages it is suitable for, etc. In effect,this scatters media across a multi-dimensional space that can searchedusing spatial queries, enabling the user to find suitable contentwithout knowing the name or location of the media.

Malcolm Colton is Vice President,Sales and Marketing/Deputy General Manager in the Embedded BusinessGroup at Hitachi America, Ltd.

Embedded Databaseresources on Embedded.com

1) EnsuringDatabase Quality
2) Designingdata-centric software
3) Providingreal-time embedded to enterprise connectivity with DDS and DBMS
4) XML,SQL, and C
5) Buildinga effective real-time distributed publish-subscribe framework
6) Tacklingmemory allocation in multicore and multithreaded applications
7) Designingdata-centric software
8) Reducecomplexity of network systems development with data-centric software
9) Telematicsoftware needs data-centric development approaches

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.