Why multiply matrices?
Here's a step-by-step analysis of why you multiply matrices.
When I took my first college course on matrices, the professor wasn't big on explanation. He showed us how to multiply matrices but didn't say why.
My first reaction was, "You're kidding us, right?"
The rule seemed so bizarre and arbitrary that it must have come to some theoretical mathematician in a drug-induced nightmare. Surely there had to be a more rational approach.
But guess what? There isn't. The rule makes perfect sense, when you see where it came from. My goal here is to give you an understandable rationale for why we do it the way we do. It'll still take you a little while to get used to the idea, and even longer to be comfortable with it. But I hope that, after my explanation, you'll at least see that it's not an arbitrary convention.
To explain the rule, let me begin with the set of linear equations that I showed you in my last column ("Who needs matrices?" December 2007, p.11):
As a first step in organizing the equations, I wrote every coefficient of the unknowns explicitly, even when it was 1 or 0:
Next, observing that the list of unknowns, the array of coefficients, and the values of the constants on the right-hand sides seem to be different sorts of things, I collected them into arrays, like this:
The last step was to name the arrays in brackets:
which reduces our equations to the ridiculously simple form:
Once I get the equations in this form, I mentally go "Ah! Linear algebra!" and an impressive array of tools stands ready to help. Understand, though, that we haven't changed the underlying relationships at all. Equation 5 is simply a shorthand version of Equation 3, which is itself a shorthand version of Equations 2, which are formalized versions of Equations 1. Conceptually, I should be able to switch back and forth between forms, to my heart's content.
Until, that is, I get to Equation 3. To go backwards from Equation 3 to Equations 2, I have to perform the act of matrix multiplication--an operation I haven't defined yet. But when we look at the two forms, it's easy to see what must be done. We must multiply each of the elements of each row of A by the elements of the column vector, x.
It'll become more clear if I assign letter values to each element of A. The matrix equation:
has to expand to:
Comparing this form to that of Equation 6, you can see all the players in their proper places. Well, almost all. There are the coefficients of A, in the same order as in the matrix. There are the constant values u, v, and w, again just where they ought to be. And there are . . . um . . . the unknowns, looking a little out of place. In Equation 6, they're in a vertical (column) array, but in Equation 7, they seem to be more like a row order.
Sorry, but there's nothing we can do about this. We could agree to write the unknowns as a row vector, but that would not only get us in trouble later, it would put us out of step with the rest of the civilized world. Better that we grit our teeth and accept the rules as they stand. After all, we (sort of) made them, when we chose to write Equation 3 as we did.