Decoding the Rosetta StoneIn my last column, we talked about the origin of what I call the Rosetta Stone equation:
Each time I try to explain the origin of this equation, I find myself wishing that I'd started further back, and continued further forward. This time, I elected to explain the origin of the Taylor series, which is usually written something like this:
As bad as the equation looks—and admittedly it looks horrible--it's based upon a rather simple and very reasonable conjecture: if I know both the value of some function at some point (we've been using x = 0 for simplicity), plus each order of derivative evaluated at the same point, I should be able to reconstruct the function. Each term like:
in Equation 2 is intended to represent one of those derivatives. If you feel comfortable with derivatives, you can prove the conjecture for yourself by choosing some function, evaluating the derivatives, and stuffing them into Equation 2. If your function is a polynomial, you'll find that all higher derivatives vanish. If not, they don't, so the series becomes infinite. Even so, Equation 2 still works. To convince you yet again, I'm going to pick a different function and go through the process. Of course, I don't have unlimited room here, so I can't show you the derivations in detail. You'll just have to trust me that I got this right.
The function is:
A graph of the function is shown in Figure 1.
Figure 1: The function
The first few derivatives of the function are:
Now, here comes the good part. Remember that notation:
That means the nth derivative of x, evaluated at x = 0. Nice as it is to have expressions for the derivatives like those above, we don't really need them. We only need their values (and the value of y(x)) at x = 0. Those values turn out to be shown in Equation 7:
We note in passing that these values aren't going down very fast, with order. That doesn't bode too well for any hopes that only a few terms will suffice. Now, putting the derivatives into Equation 2 gives something that appears to be a simple polynomial in x:
Well okay, the coefficients aren't all that simple. Then there's that ominous-looking ellipsis at the end, which hints of more terms to come. Even so, the form of the equation is just that of a plain old polynomial in x. No hint of the derivatives from which it was built. In that form lies an important lesson: when you look at Equation 2, with all its derivatives of higher and higher order, you might get the impression that the Taylor series involves a lot of calculus. It does—sort of—but then it doesn't. In the end, all those derivatives boil down to simple numbers, like those in Equation 7. And all the numbers merely serve as coefficients in a polynomial.
So in the end, what we're doing is developing a polynomial approximation to a transcendental function (a fancy way of saying it can't be represented by a simple algebraic formula). In my most recent column we saw that if the function being fitted is, in fact, a polynomial, some higher derivative, and all its successors, must compute to zero. In that case, the fit would be exact and the terms in the polynomial merely duplicate the original form. That case was nice to use as a learning tool, but not very useful.
But the function of Equation 3 isn't a polynomial. It's more complicated than that. As you can see from Equation 6, every derivative I've shown has exactly three terms. We have every reason to believe that trend will continue. There will be no end to the non-zero derivatives, so the "polynomial" never ends. It is, in fact, not a polynomial at all, but an infinite series. Hence the little "..." characters at the end of Equation 8.
COMPARING THE RESULTS
Writing down all the terms of an infinite series could take awhile. In practice, what we do is truncate the series at some order and hope that the omitted terms don't hurt our fit too badly. Even so, the best we can hope for is a reasonable approximation to the original function. Since the first term in the series is y(0), we can expect the function to fit perfectly at x = 0. We can hope that the fit will be reasonably good for nearby values of x, but we must accept the reality that the polynomial approximation will eventually diverge from the original function.
How well did we do? Well, Figure 2 shows the comparison over a somewhat limited range. The yellow curve is the original function; the magenta one, the approximation. We can see that the fit is essentially perfect over most of the range, but there's a bit of divergence near the end.
Figure 2: Short-range fit
How bad can it get? Pretty bad. Figure 3 shows the same fit over a wider range. Now we can see that the quality of the fit becomes ridiculously bad if we try to push the range too far.
Figure 3: Long-range fit
If you look at the form of Equation 8, it's not too hard to figure out what goes wrong. The first omitted term is a term proportional to x9. That term happens to be positive, so omitting it should make our approximation head south, in the negative y direction. If x is small--less than one--the 9th power of x is smaller yet. But if x > 1, the power of x gets larger yet. No matter how small the coefficient in front of this term, it will eventually dominate the result, as x increases. The result: the approximation heads for -∞, and there's nothing we can do to stop it.
In the end, no matter how hard we try, a polynomial approximation to a transcendental function is eventually going to diverge, and diverge mightily.
WHAT'S THE POINT?
If the polynomial approximation to a function diverges, why would we want to use it? The short answer: I used it in this case to illustrate how to apply the Taylor series. It was an interesting exercise, nothing more.
More to the point, people fit equations to data for a lot of reasons, and using a lot of methods. If your data comes from empirical measurements, and you don't know the nature of the function, a cubic spline fit or even a simple linear interpolation might suffice. If your data is noisy, you'll probably use a least-squares fit. The least-squares fit almost always involves a polynomial, so the divergence behavior is the same as for the Taylor series. However, in the least-squares case, the fit is specific to a given range and is guaranteed to give the best possible fit to the data points.
Other approaches to function approximations include Fourier series and Chebyshev polynomials. These are two kinds of approximations involving orthogonal functions, which we'll get to on another day.
The value of the Taylor series lies in two areas. First, we can observe that the whole concept is based on the idea of an expansion around some reference point--in our case, x = 0. When x is exactly zero, the fit is exact. This suggests that it will also be quite good as long as we stay "in the vicinity of" x = 0 (whatever that means). Other kinds of approximations aren't necessarily exact at x = 0, so this often gives the Taylor series the edge.
Even more important, however, is the use of the Taylor series for analytical work. Though we can't use an infinite series to generate numbers or draw graphs, we can most definitely use one in a math analysis. In analysis, I can hum along writing summations like:
without ever worrying about the fact that I don't know how to perform the sum.
THE EXPONENTIAL FUNCTION
To illustrate the use of the Taylor series for analysis, I can't think of a better example than the function that appears in Equation 1. We know this function as:
We're going to need to become intimately familiar with this function anyhow, and this seems a perfect place to show you where it comes from.
As you get deeper into the math of functions, you'll find that almost all of them represent solutions to differential equations. The sine function, for example, represents the solution to the equation:
The cosine function is the same, except for the initial condition:
The function we're interested in at the moment, the exponential function, happens to be the solution of the equation:
In other words, the function is its own derivative. In graphical terms, the slope of the function at some value of x is equal to the function value itself. This function is one of the most important ones in all of mathematics and physics, not to mention all of engineering. It shows up everywhere, from the decay of radioactive elements to the discharge of a capacitor.
Can you tell, from looking at the differential equation, what the solution ought to be? Neither can I, but we can find out in one of two ways. If we don't know (or believe in) the Taylor series, we can simply assume an infinite series form. This approach works for lots of problems. In the spirit of full disclosure and the desire to leave no reader behind, I'm going to give you the solution in laborious detail. If you already know all this stuff, bear with me. My main motivation is to show that there's no magic, black or white, involved. It's pretty simple mathematics, though there is a smidge of calculus involved. Let's start by assuming the most general form of a power series:
Where, again, we don't yet know the values of the a's. It's our goal to discover them by using Equation 12. You do need to know how to differentiate a power of x. In case you forgot, the rule is:
Differentiating Equation 13 gives:
Now, Equation 12 says that the two power series in Equations 13 and 15 have to be equal. Not only do they have to be equal, they have to be equal for all possible values of x. The only way this can be true is if the coefficients of each power of x are equal, term by term. That means:
Solving for each higher-order coefficient, we get:
Since every coefficient now depends on the one before it, we can solve for all of them if we know a0. And we do, because the initial condition in Equation 12 requires that:
Since all the other terms in the series vanish when x = 0, this requires that:
Now we can solve for the other coefficients, and get:
The last step is to observe that the numerators of all these coefficients are equal to the factorials 1!, 2!, 3!, and so on. Putting it all together, we can substitute the value for the a's back into the power series to get:
I must emphasize: despite my use of the generic form f(x), the power series in Equation 21 doesn't apply to just any old function f(x), but rather the function that represents the solution to Equation 12. To differentiate this function from all others, we give it a name:
Any scientific programmer will recognize this function. If you're staying one step ahead of me, you know that the mathematical notation for the same function is:
It's not immediately obvious, however, that raising e to the power x is the same thing as generating exp(x). We'll demonstrate that in a moment, but first, another approach to Equation 22.
Remember, we didn't assume anything about the function exp(x) except that:
• it can be represented as a power series, and
• its differential equation is given by Equation 12.
As we've demonstrated, that's all you need to know to solve for the coefficients of the series. On the other hand, if you know (and believe in the truth of) the Taylor series, there's another approach that makes the solution completely trivial. Since the function in Equation 12 is its own derivative, it will probably not shock you too much to discover that it's also its own second, third, or nth order derivative:
What's more, the value of all these derivatives, evaluated at x = 0, is simply f(0), which is 1. Applying Taylor's series gives us the power series in just a couple of steps:
which agrees with Equation 21.
There's a lesson to be learned here, and an important one: the purpose of advanced methods of mathematics, like calculus in general and the Taylor series in particular, is not to make life harder for struggling math students; it's to make it easier by providing ways to solve problems that either couldn't be solved at all or only solved with difficulty. What's more, these methods typically require less thinking and reasoning power, by reducing the solution to a turn-the-crank, no-thought-required sort of procedure.
In the case of the exponential function, the solution based on a general power series wasn't horribly difficult, but it did require a bit more knowledge of calculus, and some tedious substitutions. The Taylor series gives you a solution that will fit on the back of an envelope. Lesson learned. Turning a crank is better.
WHAT IS IT, REALLY?
Now, in Equation 22 I was careful to write exp(x) rather than ex, because we haven't proven (yet) that the two forms are equivalent. It's not so much that the proof is difficult, really, but rather that the truth of Equation 23 is so fundamental, it's almost like an axiom. It's easy to get caught up in circular logic, and end up proving something like 1 = 1. For example, you can say:
because that's the requirement we started with, in Equation 12. You can't say:
because we haven't proven yet that this is true of ex. Remember, no fair using the derivative from a handbook, because it was derived from Equation 23.
So how do we prove that the functions exp(x) and ex are the same? Well, let's start with some relationships concerning ex that come straight from the theory of exponents. They don't depend at all upon the nature of e.
We've already established that the first relation applies to exp(x) as well. It's the initial condition of Equation 12. As it turns out, we only need one more condition to prove that the two functions are equivalent. Remember, in doing so, we can only use things we know about exp(x), not ex. Remember, also, that the only thing we really know about exp(x) is Equation 12.
Let's start with the last of Equation 28. Differentiating (using ex, not exp(x)) gives:
Now just suppose that ex and exp(x) were the same. Then Equation 29 should be equivalent to:
But we already know what the derivative on the right is; it's the one given by Equation 12. Thus:
If, again, exp(x) = ex, this is equivalent to:
So the question boils down to this: Is Equation 31 true?
As a matter of fact, it is. You can prove it in two ways. If you're from Missouri or otherwise hard to convince, you can substitute kx for x in Equation 22, and differentiate, term by term. You'll find that you do, indeed, get back Equation 31.
If you prefer elegance to brute force, you can write this equation, based on the nature of derivatives:
But the derivative of exp(x), we already know to be exp(x). This is the same, whether the argument is x, kx, or banana. The derivative of kx is simply k. So Equation 31 is true.
Now we have two functions, exp(x) and ex. They're equal at x = 0, and their derivatives, in other words their slopes, are equal for all values of x. They must be identical. Now we can say, for sure, that:
and therefore, the power series representation of ex is:
BUT WHAT IS e?
So far, we haven't talked much about the constant, e, itself. It's called the Euler constant, and it's one of the fundamental constants of mathematics, much like π. Equation 35 gives us a good way of computing it. Simply set x to 1, and get:
If you're like me, when your college professor showed you a power series like this and assured you that the summation over an infinite number of terms resulted in a finite value, you weren't completely convinced. After all, every term in Equation 36 is positive. So no matter how small they are, shouldn't their sum grow without limit?
The answer is no. I can't prove this to you rigorously, but you can convince yourself that it's true simply by evaluating the series, term by term. Table 1 below gives the results of the summations, starting with a single term.
Table 1: Summing the series
As you can see, the series does seem to be converging, and converging rapidly. After ten terms, we've already got six digits of accuracy. What's more, you'll find that those six digits will never, ever change again, no matter how many more terms you add. The series converges.
Before we close, I want to show you one more series because it involves another technique we're going to have to get familiar with. This series also shows that you don't need a transcendental function, or even calculus, to get a power series. It is:
How can we prove it? The brute-force way is to simply multiply the series by 1-x. We get:
All the powers of x cancel, leaving only the 1.
But there's another way to derive Equation 35, and that's the method of synthetic division. It's another tool you're going to need in your toolbox, so let's deal with it here. To begin, write down the division on the left of Equation 35, just as though you were going to do an ordinary long division.
Now, think of what you would do in long division. You'd divide the first digit of the dividend (the part inside the division sign), by the first digit of the divisor, right? What is 1 divided by 1? Did you get 1? Excellent; go to the head of the class. Write:
What's next? Again, go back to your long division. The next step is to multiply the last part of the quotient (the 1, in this case) by the divisor, and subtract it from the dividend. Get:
I think you can see where we're going, now. At each "digit" (step), divide the remainder by the first "digit" of the divisor, then multiply the whole divisor by the last digit (not the whole quotient), and subtract. If you practice a bit with numbers, I think you will see that the process is the same. In Figure 4, I show the process for a few steps. As you can see, the quotient turns out to be the series in Equation 36.
Okay, by now you should be pretty well convinced that:
• we know how to handle infinite series
• the functions exp(x) and ex are identical, and
• we know how to do synthetic division
All of these tools will be needed for what follows. Next time, we'll start on Equation 1 in earnest, by introducing the notion of operators. See you then.