Here's a step-by-step analysis of why you multiply matrices.
When I took my first college course on matrices, the professor wasn't big on explanation. He showed us how to multiply matrices but didn't say why.
My first reaction was, “You're kidding us, right?”
The rule seemed so bizarre and arbitrary that it must have come to some theoretical mathematician in a drug-induced nightmare. Surely there had to be a more rational approach.
But guess what? There isn't. The rule makes perfect sense, when you see where it came from. My goal here is to give you an understandable rationale for why we do it the way we do. It'll still take you a little while to get used to the idea, and even longer to be comfortable with it. But I hope that, after my explanation, you'll at least see that it's not an arbitrary convention.
To explain the rule, let me begin with the set of linear equations that I showed you in my last column (“Who needs matrices?” December 2007, p.11):
As a first step in organizing the equations, I wrote every coefficient of the unknowns explicitly, even when it was 1 or 0:
Next, observing that the list of unknowns, the array of coefficients, and the values of the constants on the right-hand sides seem to be different sorts of things, I collected them into arrays, like this:
The last step was to name the arrays in brackets:
which reduces our equations to the ridiculously simple form:
Once I get the equations in this form, I mentally go “Ah! Linear algebra!” and an impressive array of tools stands ready to help. Understand, though, that we haven't changed the underlying relationships at all. Equation 5 is simply a shorthand version of Equation 3, which is itself a shorthand version of Equations 2, which are formalized versions of Equations 1. Conceptually, I should be able to switch back and forth between forms, to my heart's content.
Until, that is, I get to Equation 3. To go backwards from Equation 3 to Equations 2, I have to perform the act of matrix multiplication–an operation I haven't defined yet. But when we look at the two forms, it's easy to see what must be done. We must multiply each of the elements of each row of A by the elements of the column vector, x .
It'll become more clear if I assign letter values to each element of A . The matrix equation:
has to expand to:
Comparing this form to that of Equation 6, you can see all the players in their proper places. Well, almost all. There are the coefficients of A , in the same order as in the matrix. There are the constant values u , v , and w , again just where they ought to be. And there are . . . um . . . the unknowns, looking a little out of place. In Equation 6, they're in a vertical (column) array, but in Equation 7, they seem to be more like a row order.
Sorry, but there's nothing we can do about this. We could agree to write the unknowns as a row vector, but that would not only get us in trouble later, it would put us out of step with the rest of the civilized world. Better that we grit our teeth and accept the rules as they stand. After all, we (sort of) made them, when we chose to write Equation 3 as we did.
This, then, is how you multiply matrices and vectors. It's a two-finger operation. Perhaps the real giants of mathematics could play 8-note chords on their mathematical pianos. Perhaps some could even hear the music in their heads, using no fingers at all. As for the rest of us, we're reduced to playing Chopsticks.
Look at Figure 1.
Put the index finger of your left hand on the upper left element of A (the a ). Put the finger of your right hand on the first element of x , which is the scalar x . That's your first product, ax . Write it down. Now move your left finger one position to the right, and your right finger one position down. Your next product is by . Add that to the ax term. Now move the fingers one more time, to get cz . Now you've got your first sum, which is the expression:
It's also the left hand side of the first of Equations 7, and it's equal to the first element of u , which is the scalar u .
To get the second and third equations, repeat the process using the second, and then the third row of A .
Well, I told you it was bizarre. But again, there's no getting around it. We wrote the rule that requires this kind of matrix product, when we chose the form for Equation 3. That form is so straightforward and so rational that we accept the resulting multiplication as a necessary evil.
After using matrices for awhile, do you ever get used to the two-finger process? I don't know. I've only been using it for 40 years or so. If I ever get to the point where I can hear the music without using my fingers, you'll be the first to know.
Did you notice, by the way, that the products in Equations 7 look like vector dot products? Remember, to get the dot product of two vectors, you multiply them element by element and add the partial products. If we write the matrix A as a set of row vectors, we can write:
where each of the elements a , b , and c are row vectors. We rarely use this notation, because vectors are almost always thought of as column arrays, but it's perfectly okay to think of the individual equations as dot products.
Subscripts and indices
Before we move on, there's one more change of notation that I have to make. You'll notice that, in Equation 6, I assigned a unique character to each element of each array. Even for that one representation of three equations in three unknowns, I used over half the letters of the alphabet. It's pretty clear that we are soon going to run out of them.
Matrix virtuosos get around this problem by using a subscript notation. Such a virtuoso would write:
This equation expands to the three scalar equations:
Now, don't panic just because you see subscripts. Don't let this be an eyes-glaze-over moment. All we've done is replace one unique set of identifiers with another.
The meaning of the subscripts like x 1 , x 2 , and x 3 should be obvious. x 2 is the second element of x . The double subscripts are less obvious, but simple. Element a 23 is not the twenty-third element of A , but rather the element in Row 2, Column 3. The indices always go in this order: row first, then column. The position of the element a ij is uniquely defined as the element in Row i, Column j of the array.
What do you do if the indices extend to more than a single digit? Gee, I don't know; I've never had to deal with that. If you find the notion that 12 doesn't mean twelve is too confusing, you can always put in a Pascal-like comma, and write a 1,2 . Personally, whenever I get beyond the range of single-digit indices, I hope I've already been able to generalize to element ij . And I can tell you this for certain: before I try to multiply 10×10 matrices by hand, I'll take the time to write a C++ function to do it. Trust me, the two-finger method is by no means up to that task.
Let the compiler do it
Fortunately, the people who developed the first high-order languages (that would be folks like John Backus of Fortran and Niklaus Wirth of Pascal fame), knew that vector and matrix algebra would be high on the list of potential applications, so they designed the ability to handle arrays into their languages.
Equally fortunately, the matrix product rule is tailor-made for a set of nested loops. Consider Equations 11 again:
The only difference in the three equations is the value of the first index. So I can say, simply:
Here's a code fragment that does the product. Write this, and you can retire your two index fingers:
for i = 1..3 sum = 0; for j = 1..3 sum = sum + a(i,j)*x(j); end u(i) = sum; end
Here, I've chosen to use Matlab notation rather than C/C++. That's because there are still some issues we need to discuss, such as 0-based vs. 1-based indexing. The code above is easy to translate, though, to any language you might choose. For now, I'd rather move on to more conceptual sorts of things.
What does it all mean?
At its most fundamental level, a matrix transforms a vector by scaling it in some fashion. Here's a matrix that multiples the x , y , and z components of a vector by different scale factors:
Try this, using the two-finger method, and see what happens.
Here's a matrix that simply doubles any vector it multiplies. It's called a scalar matrix , because it has the same effect as multiplying every element of the vector by a scalar:
And here's a matrix that does nothing at all. It's a scalar matrix with a scalar value of unity. It's called, for obvious reasons, the unit matrix , and it's traditionally denoted by I (some folks prefer a boldfaced 1 ).
Don't laugh; it has its uses, some of them quite important. For example, suppose we know that a matrix product with a vector must produce a new vector that's parallel to the old one. That is:
It's tempting to write:
But this makes no sense. A is a matrix, while is a scalar. We can, however, write:
In words, this says that the matrix must scale x so that every element is zero. As it turns out, this can only happen for specific values of , which are called the eigenvalues of A . Eigenvalues are hugely important in many practical areas of physics and engineering, including stability of feedback loops and other controls issues. We don't even have to look at x to learn this stuff. The eigenvalues of a matrix are properties of the matrix only; we can study system stability without bothering to consider what happens to x .
One can look at a matrix-vector product in two very different ways. From one point of view, a matrix transforms a vector in some fashion or another, like the simple doubling of Equation 15. More often, it's a more profound change, but you can usually imagine that the matrix warps an existing vector into a new but related one. This is the viewpoint that we see more often.
But there's another viewpoint that's diametrically opposed to this one, and it's sometimes a useful one as well.
Books on vector analysis often introduce the concept of basis vectors , which are vectors that are fundamentally associated with some vector space or another. For example, consider the Cartesian coordinate system of Figure 2.
Let us define unit vectors , , and that lie along the three coordinate axes (the little hats over the characters indicate that the vectors have unit length). The idea of basis vectors is this: I can express any vector that can be drawn in this space as a linear combination of the three basis vectors. Thus I can write:
Mathematically, Equation 20 says, “create new vectors that are scaled versions of the three basis vectors. Now add them up to build v .” Equation 20 is mathematically identical to the simpler statement:
In one case, we build a vector by adding scaled versions of the basis vectors; in the other, we simply stuff the components x , y , and z into their proper positions. We get the same result either way, but the mental pictures are very different.
Now, if the basis vectors are vectors in the space, they must have vector representations too, right? But what are they? Well, has to be a unit vector lying along the x -axis, so it must have an x -component of 1, and y – and z -components of zero. Similar arguments for the other two vectors lead us to:
Now look at the unit matrix shown in Equation 16. Its columns are simply the unit vectors in Equation 22. So I can write:
From this form and the rules of matrix/vector multiplication, it's clear that:
These gyrations may seem to be circular logic, but there's an important distinction to be learned. When I write an expression like Av , my mental picture is one of warping the vector v in some fashion. Likewise, the product Iv should simply be equal to v , which it is because the columns of I are the unit vectors of the space I'm using. But the roles of I and v have been completely reversed in the two points of view. When I look at Equation 24, I can see that the elements of v are not being treated as vectors at all, but simply coefficients of a linear combination of the basis vectors. The master has become the student.
Which point of view is the correct one? Neither, and both. They both say the same thing, but sometimes one viewpoint leads to a simpler mental picture than the other.
Like now. The unit matrix I is a very special one, after all, but it's only special in the sense that its basis vectors happen to be very simple. But if the matrix/vector math works for that matrix, it must always work for any other. So I can think of any matrix A , then, as a set of basis vectors who don't happen to be so cooperative as to lie along the coordinate axes. Then I can write:
where a 1 , a 2 , and a 3 are three vectors –the basis vectors for the space described by A .
Where's he going?
You may be asking, what does all this stuff have to do with a C++ matrix class? Oh, not much, except that it defines the rules of mathematics that matrices must obey. Let me explain.
Suppose I have two matrices, A and B , which I use to transform some vector x . Let:
Now let me add these two vectors:
But suppose I write A and B in terms of their basis vectors and perform the products as in Equation 24. I should get:
This tells us that, to add two matrices, we must add their basis vectors, column by column. But we already know that vectors add element by element, so the rule for adding matrices must be:
Add them element by element.
If you liked the way the basis vector viewpoint defines how matrices should add, wait'll you see how it illuminates multiplication. Let's begin with the vector:
And the matrix:
using the form of Equation 28, the product must be:
Now let's transform this product again, by the matrix A . We have:
We'd like to be able to define the product of A and B , as in:
Comparing this with Equation 33, we see that the product can only be:
In other words:
A matrix transforms another matrix, column by column.
That's our rule for multiplication. A little reflection should convince you that matrices are associative . That is:
This is important, because there are many times when we might want to transform many vectors x , but the values of the transforming matrices don't change. According to Equation 36, I can perform the matrix product just once and use the result to transform as many x 's as I like.
Don't get too cocky, though. While matrices are associative, they are not commutative.
We've learned how matrices must add and multiply. We have only one primitive operation left, so I'll try to squeeze it in.
There is no such thing as a division operator for matrices, but we can define the next best thing. In ordinary scalar arithmetic, the inverse of any number (except zero) is defined by:
It's common to write the inverse as s –1 .
If there is such a thing as a matrix inverse, it should have the properties:
where I is the unit matrix of the proper size.
Is there such a matrix? It turns out that the answer is yes, as long as:
• A is square,
• And the determinant of A is non-zero.
How do we compute that inverse? Ah, that's the subject of another column or two, but here's a hint: you've already done it, if you've ever solved a set of simultaneous equations, as we did in “Who needs matrices?” (December 2007, p11).
Now that we have the rules down for the primitive math operations, we can begin to start turning them into code. That's where we're going next. See you then.
Jack Crenshaw is a senior systems engineer at General Dynamics and the author of Math Toolkit for Real-Time Programming. He holds a PhD in physics from Auburn University. E-mail him at . For more information about Jack