On to objects - Embedded.com

On to objects

Avoid thrashing memory and eating clock cycles when using complex objects, such as vectors and matrices, in your code.

Lately we've been looking at software functions to perform operations on mathematical entities known as vectors. My goal is to give you functions that allow us to operate on both vectors and matrices, in various combinations. Perhaps it's time for me to recap where we've been and where we're headed.

First, a clarification. To a programmer, a vector looks a lot like any other array. Both of them are one-dimensional array of numbers, any element of which can be accessed. But the vector is mathematically different; it's associated with rules of behavior under math operations like addition and multiplication. Likewise, a matrix looks like a two-dimensional array, but it's got its own rules of engagement.

For many years, I've longed for a programming language that would allow me to write math assignments containing vectors and matrices, in a natural way that looks much like the math equations that they represent. For example, I might have:

y = H -1 (Ax +Bu ) (1)

where x , y , and u are vectors (not necessarily the same size), and A , B , and H are matrices. I'd love to be able to program the equation in a similar way, perhaps:

y = (A*x+B*u)/H   

(Note to the experts: check out the cool matrix inverse.)I've been trying for many years to reach this nirvana. My very first function in a high-order language was a Fortran subroutine to add two vectors. But I couldn't write:

c = a + b   

I had to settle for:

CALL VADD(A, B, C)   

In a previous column, you saw the modern equivalent in the C/C++ function:1

vAdd(a, b, c);   

That function is part of the file SimpleVec.cpp , which you saw last time (posted online at www.embedded.com/code/2007code.htm). Except for syntax, it's virtually identical to the Fortran subroutine. The advent of C++ has changed the picture entirely. Because of its unique support for both user-defined types (objects) and operator overloading, a C++ statement like y = (A*x+B*u)/H is literally possible.Possible, but not always either easy or efficient. Anyone who's ever looked at how a compiler works knows that to evaluate complicated expressions the compiler has to create temporary variables. A statement like:

x = (a+b)/(c*d-f)   

Might be expanded into:

T1 = a+b
T2 = c*d
T3 = T2-f
x = T1/T3

As long as the objects being manipulated are simple types like integers and floats, creating and discarding these temporaries is a trivial matter. But when they're complex objects like vectors and matrices, it's not. If we're not careful, we end up thrashing the memory manager and eating clock cycles in the process.After pondering this dilemma for some time, I realized the obvious solution: give the programmer alternatives. Let him have high efficiency when he wants it, but let him have programming simplicity, too. The C++ statement:

c = a + b;   

is handy, but costly, because a temporary vector must be created. On the other hand, the statement:a += b is much more efficient, and the Fortranish:vAdd(a, b, c) better yet. If that's still not good enough, we can always define vAdd as a macro and let the compiler optimize the code even further. The trick, then, is to provide a layered approach covering the gamut of needs. That's where we're going. The file SimpleVec.cpp represents just the first step. Now it's time to take the second, which begins with a proper C++ vector class.

The vector class
As most of you know, the syntax of C++ classes requires the following body plan (dare I say template?):

class Vector{    public:    Vector( );    ~Vector( );};   

Here I've only shown two member functions, which are the default constructor and destructor. Everything else, including the content of these functions, depends upon our choice of how to represent the data.If we knew that all the vectors were going to be the same length—most likely length 3—the decision would be a slam-dunk. Declare:

double v[3];   

and move on. The resulting code would be very, very efficient. The constructor and destructor can also be left empty. Indeed, in past lifetimes I've been known to create a separate class for 3-vectors. Mixing them with other vectors in various combinations, say in a copy assignment, can be a little messy, but manageable.This time, however, I'm going to stick to my plan of the last column and allow for vectors of any length, with a default length of three.2 This should let me illustrate the points without getting bogged down in conversion minutiae. Note this point, however, as one you might want to revisit to squeeze out more efficiency.If different vectors can have different lengths, we're led inexorably to dynamic memory allocation. We also need to record the size of each vector. So the class data must look something like this:

int sz;
double *p;

Because the class now relies on dynamic allocation, we can be sure that we must define nontrivial functions for the constructor and destructor. My first cut at a vector class is shown in Listing 1. Note carefully the size parameter with default value in the constructor.

View the full-size imageThis is important. C++ rules require that every class have a default constructor. If we don't provide one, the compiler will make up one that does something reasonable. But we're not likely to like the compiler's choice, because it won't be using any dynamic memory allocation.However, the compiler's definition of “default” is a loose one. To the compiler, a default constructor is one that doesn't require any input parameters. My constructor fits that requirement; since the value of n can default, the compiler is happy with calling it with no argument.

The init function
I've used code like Listing 1 in the past. Unfortunately, it doesn't always work. The reason is buried down in the bowels of C++, and it's worth teasing out.

Whenever we declare a variable of type Vector , the compiler has to set aside memory for it. That's what the constructor is for, and the one shown in Listing 1 works fine for this. If we specify the size, the compiler will call the constructor with the appropriate value of sz ; if we don't specify it, the compiler will call what it sees as the default constructor, which will allocate a vector of size 3.

So far, so good. But what if I need an array of vectors, like:

Vector x[4];   

Note carefully, I'm not asking the compiler for one vector of size 4; I'm asking for an array of four vectors. The four could even have different sizes. But how do we tell the compiler what size to make them? I'd like to be able to say, for example:

Vector(6) x[4];   

In my pidgen syntax, I'd be saying “build me an array of four vectors, each of size 6.” But the C++ language doesn't accept this syntax or any similar construct. Although it will let us pass arguments to constructors all day long, it won't use them as the default constructor. If we want an array of vectors, they can have any size at all, as long as it's 3.Ahead to the past
Once upon a time, Borland International introduced a language called Pascal With Objects. It was a very nice language. I thought it was remarkable because Borland was able to persuade Pascal to handle objects, using only a very few new keywords.

Borland Pascal had constructors and destructors, but they had to be explicitly called by the programmer. There was no automatic invocation of the constructor.

At the time, many C++ fans gave this as a disadvantage of Borland Pascal. With C++, they argued, the constructor and destructor are called automatically, so you don't have to worry about forgetting to call them.

Which would be great, of course, if the compiler knew enough to properly construct the object when it sees the declaration. The Borland folks weren't sure this would always be the case. You've just seen one example that seems to prove them right.

So what do we do? Answer: the same thing Borland did. We need a sort of “callable constructor” to finish the job that the constructor could not. We need to take the responsibility for dynamic memory allocation away from the compiler-called constructor. In Listing 2, I've shown a callable constructor called Init( ) , along with the new default constructor. Just don't forget that before you use the vectors in this class, you have to explicitly initialize them.

View the full-size image

To me, this business of separating out initialization from construction makes sense, and it assures that the initialization of the vectors can be put off until we have the info needed to construct the vector properly.

Take a good look at function Init( ) . There are some subtleties that you might miss at first glance. First, the function is intended to be called to initialize newly declared objects. But this doesn't necessarily have to be the case. If a vector is already initialized, there's no harm in initializing it again. If the existing array is as large or larger than the new one, it'll be retained. All we lose is a few wasted elements, as sz is changed. If the old vector is too short, Init( ) will delete it and create a new one.

You might also notice that I clear the array using the SimpleVec function, vClear . This is obviously a frill. If we're going to store data into a vector, we don't need to start by storing zeros there first. But to me, clearing arrays is good practice, and I wouldn't skip the step unless I were really hurting for clock cycles.

Input/output
In my previous offering, SimpleVec.cpp , I pretty much saved the I/O for last. But a prime directive of the object-oriented paradigm, is that the class data items should only be accessed through functions that fetch or update them. If we are going to see the contents of the vectors at all, we're going to need such functions. Access to sz is easy:

int Vector::Size(void){    return(sz);}   

Access to the vector part is a little more problematic. Before I'm done with this class, I promise to have constructors and assignment statements mixing every kind of data type we can reasonably use. But one of the most important properties of any proper vector should be the ability to access individual elements of the vector. To allow this, I've used the overloadable operators, ( ) and [ ] . Each function returns a reference variable to the element, which means you can both read and change the element.The reason for using both is pretty cool. In modern languages like C, C++, and Pascal (also Mathcad), array indexing begins with zero, so the third element in the array is really x[2] . Many scientifically oriented languages, including Fortran and Matlab, start the counting at one. This is not a trivial matter. Virtually all textbooks on both vector/matrix math and science/physics, use algorithms that use the base-1 approach. Sure, the algorithms can always be converted, but doing so isn't easy and is fraught with error. Don't believe me? Then consider: one popular book on scientific programming retains the base-1 approach using C arrays, wasting the first element of each array rather than face the challenge of altering the algorithms.By using both the ( ) (for base 1) and [ ] (for base-0) operators, I can give the programmer both options. Such a deal. The code is shown in Listing 3. I realize that using operator ( ) in this sense is unconventional.

We'll talk more about this next time.

View the full-size image

Next time
That's all the code we have room for this month. The code I've arrived at so far is posted online at www.embedded.com/code/2007code.htm. It includes a test driver. For your convenience, I've copied the one SimpleVec function, vClear( ) , into the file, so you won't have to bother linking.

Next time, I'll give you that promised army of constructors, assignment operators, and stream I/O functions. After that, we'll start to focus on the math operations, which are the point of the whole thing. See you then.

Endnotes:
1. Crenshaw, Jack.
“Making the tough coding decisions,” Embedded Systems Design , December 2006, p.10.
Back

2. Crenshaw, Jack. “SimpleVec wrap,” Embedded Systems Design , February 2007, p.11.
Back

Reader Response


While you can't normally provide a non-default initializer for a plain-old array, as you say, you can use std::vector for this purpose:

std::vector v(numVectors, vectorSize);   

(This is equivalent to the desired Vector v[numVectors](vectorSize) but with the advantage of being actual C++.)If you insist on using plain-old array, you could use a template subclass to get the effect you desire:

templateclass VectorN: public Vector{public:    VectorN(): Vector(n_) {}};Vector v(10);VectorN<10> w[100];   

(Note: ~Vector should be virtual.)Actually, it's worth considering using a template to specify the size of a vector instead of using dynamic array (new[] , delete[]) . Some operators (e.g., scalar multiply) work regardless of size, but others can only be applied between like-sized vectors, for example. Define the operators in the base class when they work for all sizes and in the template when they require like-sizes or specific sizes.Have you thought about extending std::vector and just adding your mathmatical operators?

class Vector: public std::vector ...   

or

templateclass Vector: public std::vector...   

(allowing float and maybe int versions as well as double) (maybe even version)

By the way, Borland didn't separate init() from allocation because they thought they were better left separate. Indeed, they modified the allocation and deallocation routines to specifically enable combination with constructor/destructor calls, e.g., new(p, init()) .

Unlike in C++, you must explicitly call the constructor; but it is intended that you call it at allocation time. (You can explicitly specify which constructor you want, using distinct names rather than overloading; but you still cannot specify a constructor for array elements.)

While I'm talking Pascal, I must point out that Pascal arrays do not start at zero (as the column states). The programmer may specify any desired start and end indices, e.g., Array [10..20] of integer .

One last thing: you should use unsigned int sz (or size_t sz ); negative array sizes make no sense.

-Paul Hepworth
Draper, UT


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.