If you can toss around matrices as you might a Frisbee, you're one of the superheroes. If not, you're on the sidelines, watching.
Now that we've completed the vector class, it's time to move on to the matrix (plural: matrices). In the worlds of math, engineering, and physics, it's the matrix that separates the men from the boys, the women from the girls, the heroes from the goats. If you can toss around matrices as you might a Frisbee, you're one of the superheroes. If not, you're on the sidelines, watching. Hint: if you say “matricee” as the singular of matrices, you're not even in the stadium.
I want to show you the C++ code that lets you toss around matrix-vector expressions, but at this point, I think it's much more important for me to give you the motivation.
• Why do I need a matrix?
• What good are they?
• How does matrix algebra help me?
John, Mary, et al.
Remember those word problems that used to drive you crazy in high school algebra? Here's an easy one, for old times' sake:
1. Between them, John and Mary have $1.00
2. John has three times as much money as Mary.
3. How much money does each child have?
Now, if you've been thinking outside the box, instead of having your eyes glaze over at the phrase “word problem,” you already know the answer. Think three quarters in John's hand, one in Mary's.
Maybe that one was too easy. Try this one:
1. Between them, John, Mary, and Sue have $1.00.
2. Sue has 40 cents more than John.
3. Sue has as much money as John and Mary together.
Not so obvious this time, is it? As you may remember, the trick is to write down equations that capture the information in the problem. To keep it brief, let J , M , and S represent the money each child has. Fact #1 says:
J +M +S =100         (1)
The next one says:
S =J +40           (2)
S =J +M         (3)
Now we need to solve for the individual values. As you may remember, the trick is to manipulate the equations so that we isolate a single variable, one at a time. For example, Equation 2 gives us a value for S . Let's substitute it into the other two equations, to get:
J +M +(J +40)=100      (4)
J +40=J +M
A little simplifying gives:
2J +M =60      (5)
Luckily, the terms in J in the last equation cancelled each other, isolating the value of M . We now know that Mary has 40 cents. Substituting that value into Equation 5 gives:
2J =60–40=20      (7)
Now that we know the value of J , we can solve Equation 2 for S :
S =50      (8)
The problem is solved. John has 10 cents, Mary 40, and Sue, 50.
I should probably point out that the sequence of steps I used to solve the problem is not at all the only one, or even the best one.
Way back when, we used to call problems like this one a problem in “simultaneous equations.” Simultaneous because, of course, the solution values must satisfy all of the given relationships–in this case Equations 1 through 3. As you remember this kind of problem, you will probably also remember that you must have as many equations as you have unknowns. Give me two equations in three unknowns, and I can't give you an answer. But give me three, and I can be sure to solve the problem (unless one of the equations gives us no new information). If you don't remember anything else about this kind of problem, at least remember this–it's the term “n equations in n unknowns.” And, as you can see, the solution comes by trying to isolate the unknowns, one at a time. As you find the value for one of the unknowns, you now have n –1 equations in n –1 unknowns, and you can continue to whittle the problem down to size.
A little order, please
Mathematicians have been solving problems like this for, oh, a few thousand years, give or take. For a long time, they were satisfied to treat each problem as it presented itself, trying operations pretty much at random, until they got the answer. But that approach leaves us very much dependent upon our cleverness in deciding which unknown to eliminate first, and how. A process that depends on the cleverness of the mathematician may help pump up the ego of that mathematician, giving the same satisfaction that he might get from solving, say, a crossword or Sudoku puzzle. But we're not talking about feeding egos and giving satisfaction here. There are real problems to be solved, and we'd rather not depend on heroes to do it.
Over the years, I've learned an important fact about advanced methods of mathematics (it works for engineering and physics, too). It's a lesson that took a long time to learn, and I'm giving it to you free: the whole point of advanced mathematics is not to feed egos, but to make goats look like heroes.
We often do that by devising systematic approaches that would and can be done the same by everyone. In other words, reduce a process that calls for cleverness to one that involves simply turning a figurative crank. That's the case with the kind of problem we've been studying here.
As a step in that direction, it was decided that we would always write down the equations with only terms involving the unknowns on the left of the equals sign, and known values on the right. In that format, our equations look like this:
J +M +S =100      (9a)
S –J =40      (9b)
J +M –S =0      (9c)
With the equations in this form, we can solve them very systematically by adding and subtracting the equations (scaled, if necessary) to/from each other. Adding equal values to both sides of an equation won't unbalance it. Everything will still balance as long as we remember to add both sides of the equations. As I do so for this example, I'm going to retain all three equations so you can see how they change at each step.
According to this turn-the-crank process, we always choose to eliminate the first variable first. For our example problem, let's add Equation 9a to 9b, and subtract it from 9c. The new equations are:
J +M +S =100      (10a)
2S +M =140      (10b)
–2S =–100      (10c)
As you can see, the unknown J now appears only in the first equation. Purely by happenstance, we managed to remove M from the third equation. But because I'm trying to follow a systematic approach here, I'm going to pretend I didn't notice, and go on. The next step, is to use Equation 10b to isolate M . Subtracting this equation from Equation 10a gives:
J –S =–40      (11a)
2S +M= 140      (11b)
– 2S =–100      (11c)
Now M appears only in the second equation. To complete the job, we must eliminate S from the first two equations. First subtract 1/ 2 of Equation 11c from 11a, and add it to 11b. We get:
J –S+S =50–40      (12a)
M =140–100      (12b)
–2S =–100      (12c)
M =40      (13)
which agrees with our previous, more ad hoc solution.
You may have noticed that the more systematic solution can take more operations than we might take if we were more clever. The price of making goats look like heroes is that the process may sometimes take a little longer. It's a price we tend to gladly pay, to keep us from falling short of hero status.
Over time I've observed that if we're able to attach a name to a problem, we're a long ways along toward solving it. If a doctor can say “you've got a virus” as opposed to, say, “you've got a hangnail,” he knows which remedies are likely to work and which ones won't. A lawyer who says, “This is a case of personal liability” rather than a case of murder, can eliminate a lot of the books in his library, in search of a precedent.
It's the same way with math problems. The equations of the kind we've been looking at are of a special kind. Not only are they simultaneous equations, but they're simultaneous linear equations–linear because each of the unknowns appears only to the power one. There are no J 2 terms or terms. The science of solving problems involving multiple linear equations in multiple unknowns is called linear algebra. Once we have recognized the word problem as one of linear algebra, we're very much farther than halfway to a solution.
Remember this: you always need as many equations as you have unknowns. If you have, say, three equations but four unknowns, the problem is under-specified. You can't solve the problem entirely. It's not that there is no solution to the problem, but rather that there's an infinity of them. The most you can hope for is to be able to express three of the unknowns in terms of the fourth one.
If, on the other hand, you have four equations in three unknowns, the problem is over-specified. In general, no solutions will satisfy all four equations. The exception is if one of the equations is in fact superfluous–a simple restatement, however disguised, of the other three.
In our example, I had three equations–statements of fact–involving the three unknowns J , M , and S . It should go without saying that a simular problem involving 37 equations in 37 unknowns will take longer to solve, but is no different, in principle. If we apply the same turn-the-crank approach (and if–a big if–we don't make a silly math error), we'll eventually get the answer. As my old major professor used to say, “Conceptually, there's no problem.”
Before we leave this example, I want to show you how the problem of simultaneous linear equations morphs into a matrix equation. To begin, we first agree to write down all the possible terms in each equation, with explicit coefficients, including 1 and 0. If a given unknown doesn't appear in the equation, we put it in anyway, with a coefficient of zero. We also keep the unknowns in the same order throughout.
Following this rule, Equations 9a through 9c look like this:
1J+ 1M+ 1S =100
– 1J+ 0M+ 1S =40      (14)
1J+ 1M– 1S =0
Looking at this form, we begin to get the idea that the names of the unknowns doesn't matter. If we had called them x , y , and z , it wouldn't have changed the solution. In fact, the essence of the problem is in the array of coefficients themselves. If I give you that array, you are way far along in finding the solution. In our case, that array is:
At this point, I'm going to assert that this array is, in fact, a matrix. By convention matrices are denoted by uppercase, boldfaced characters, while vectors are lowercase and boldfaced. Without knowing anything about matrix-vector math, let's define the two vectors:
Then I can write the problem in the incredibly compact form:
Ax =u      (17)
There is no such thing as a division operator for matrices, but there is the next best thing: an inverse operator. So, conceptually, I can write:
x =A -1 u      (18)
How does one compute the inverse of a matrix? We've already done it. The steps we took in Equations 9 through 13 are nothing less than the Gaussian elimination leading to the inverse. This one time, we did it the hard way, only one step removed from the high school way. If you never liked those word problems that ended up as simultaneous linear equations, rejoice: this is the last time you'll ever have to do it. Armed with a matrix inverse operator–all the commercial math aids have them, as will our C++ library–you can go from Equation 17 to 18 in a single step.
As a teaser, I'll just toss out there that, while the matrix division operator doesn't exist, I've appropriated the division operator in the C++ class. Thus, turning Equation 18 into C++ code is as simple as writing:
x = u/A;
An important point may have slipped past you. As we solved the example problem, it was not at all obvious that the input or constraint vector, u , is practically immaterial to the solution. The matrix A is purely a matrix of the coefficients of the elements of x –in our case, J , M , and S . Therefore we can invert the matrix as a standalone entity, completely independently of u . In my statement of the problem, I could have substituted any constant values whatever for the elements of u . While the resulting values for x would absolutely be different, the inverse of the matrix itself would be exactly the same. This fact gives us a lot of leverage if we're looking at different values of u at different times. No matter how many such values there are, we only need to invert A once.
Other matrix applications
Solving word problems is certainly not at all the only problems we can cast into matrix form. Many other problems look a lot easier using matrix algebra, and many others are the same problem, in disguise. A familiar one is linear regression, which in turn is a subset of polynomial regression, which is itself a subset of optimal curve fitting. In Figure 1 , I've shown a set of data points with a lot of scatter, which we probably got from a sensor beset with lots of noise. Our challenge is to fit a curve through the data.
Suppose I try to draw a straight line through the data. Clearly, it's not going to pass through all the points; chances are, it won't pass through any of them. The equation for a straight line is:
y (x )=ax +b      (19)
where a and b are coefficients to be determined. I can choose lots of different straight lines and claim that they're the best fit to the data, but which one is really best?
This is the problem of the least-squares fit. For any of the points, we compare the value of y give by Equation 19, with the measured value. For want of a better criterion, I'm going to claim that the best fit is one that minimizes the sum of the squares of the errors. That's the least-squares fit.
I'm going to be covering this problem in detail in a later column. For now, I only want to say that we can write down the equations for the sum of the squares of the errors. Call it M for historical reasons. To minimize this sum, we end up taking the partial derivatives of M with respect to a and b . The optimal fit occurs when both partials vanish.
For a linear equation, the results are going to look like:
pa +qb =r      (20)
sa +tb =v
where the constants p , q , etc., depend only on the measurement values, not a or b . Does this look familiar? Yes, indeed, it's our old friend, the simultaneous linear equation. Using the same techniques, we can easily solve for the coefficients of y (x ).
When we get into the least-squares problem, you'll see that the size of the array is completely independent of the number of data points. We need at least two to fit a straight line through them, but two or 200,000, it makes no difference. The matrix will be different, but we're still only solving for two unknowns.
What's more, if we like, we can assume a higher-order polyomial. The curve in Figure 1 looks like it might be a cubic, so will have four coefficients. The number of coefficients makes little difference. The only difference between them is the size (order) of the matrix. And even when we have many, many data points, the order is modest.
things to come
Simultaneous linear equations and the least-square fit. Seemingly unrelated problems, but they both fall before the might of the matrix inverse. But these two disciplines are not by any means the limit to the value of matrix algebra. Other areas include:
• Coordinate transformations.
• Rotational dynamics.
• Multivariate control theory.
• Optimal control theory, including the Kalman filter.
As we go through the steps to define a matrix class, I'll be showing you, as I did for the vector class, practical applications. I think you'll enjoy the trip.
Jack Crenshaw is a senior systems engineer at General Dynamics and the author of Math Toolkit for Real-Time Programming. He holds a PhD in physics from Auburn University. E-mail him at firstname.lastname@example.org. For more information about Jack click here