# Rosetta Stone explained

July 26, 2005

As promised, this month we're going to talk about my favorite relationship, which is the operator relationship:

 (1)

This innocent-looking equation pretty much captures everything we need to know about the transition from the universe of continuous time in which we live (where the operator D makes sense), to the world of digital computers, digital signal processing, and digital control systems, where z lives. Though its development is purely theoretical, Equation 1 has immediate and profound usefulness in all those areas, plus a lot more. Because it connects the analog and digital worlds, I call it the Rosetta Stone.

I've derived the expression a couple of times before in this column, but I always manage to leave some folks behind. Even those who get it, I fear, don't really see how to put the relationship to work to solve their problems. So it remains an elegant, but essentially useless, curiosity—a mantelpiece gimcrack.

I intend to fix that situation, starting now. I'll explain both the derivation and the use of the equation so thoroughly that everyone reading this will get it. To help you understand this concept, I'll start from a point further back where we're all comfortable and work forward in small steps. Then I'll go further forward describing more applications to help you understand how to put the thing to work.

I can't do this in one column, so bear with me; we're embarking on yet another multi-issue study. Grab a bite, drink a Jolt cola, and let's get started.

The Taylor series
The jumping-off point for the Rosetta Stone derivation is the Taylor series, which looks like this:

 (2)

What's that you say? You're already lost? Well, duh! What a surprise. Last time, I used this equation as a great example of the kind of thing that makes people's eyes glaze over and cause their brain to shift into neutral. What was I thinking in starting there?

First, we have to step back far enough so that, when I show you Equation 2 again, you'll not only understand it, but accept it as absolute and obvious fact. To do that, let's go back and look at what functions are and how they work.

Functions
Remember Moore's Law? It says that computer capabilities double every 18 months. When graphed, as in Figure 1, the law shows that after 10 years, computer capability should have gone up by something over 100. In 20 years, it should be up by a factor of 10,000 or so, and that prediction is pretty darned close.

Figure 1: Moore's Law

The graph in Figure 1 is an example of a functional relationship. It says that the computing capacity depends upon the time. Give me a time, and I'll tell you how much computing capacity has increased. It's conventional to emphasize that things like capacity are functions of other things, like time, this way:

 (3)

The purpose of the parentheses is to tell you what C is a function of.

Because we physicists and mathematicians are a lazy lot, we tend to use single-letter, or few-letter, identifiers when we can. Bad practice for programming, but good practice in algebra. A shorter version of Equation 3 is:

 (4)

which we would read, in English, "C equals ƒ of t." Even more concisely, we may simply write:

("C of t") (5)

Either form makes it crystal clear that C is dependent on t. Not surprisingly we call t the independent variable, and C the dependent variable. When we plot the relationship as a graph, we always plot the independent variable on the x- (horizontal) axis, as in Figure 1, and the dependent variable on the y- (vertical) axis. The reason is purely historical; it's tradition.

Not just any old curve on a graph qualifies to be called a function. The curve of Figure 1 is special in a number of ways. First and foremost, the independent variable—t in this case—can take on any value, at least within some specified range. There can be no gaps in the curve because of some restrictions on t. In short, the independent variable is continuous.

Later on, when we get to digital systems, we'll deal with cases where this assumption isn't true. In fact, the whole purpose of Equation 1 is to let us move between the two universes, those of continuous time and discrete time. But right now we're still in the domain of continuous variables. In computer terms, think floating point rather than integers.

Next, a function must be single-valued, which means that, given a value of the independent variable, there can be only one value of the dependent variable. That's certainly the case in Figure 1. Given a value of t, only one value of C fits the functional relationship.

Finally, the function itself must be continuous. We can give a very precise mathematical definition of a continuous function, but intuition works well here, also. In layman's terms, we should be able to draw the graph of the relationship without lifting our pen.

Figure 2: Piecewise-continuous functions

The mathematically inclined among you will see right away that what I just said isn't rigorously true. Plenty of useful functions aren't continous. Figure 2 shows some examples. In this figure, the curves are well-defined, nicely continuous, functions, except at certain discrete values of x—the yellow dots in the figure. Usually the number of such points is relatively small over some specified range. Except at these special points, x can take on any value, and there is only one, well-defined value of y. We say that such curves are piecewise-continous, and such curves represent acceptable functions.

Forecasting the future
With the benefit of the definitions concerning functions, we're now armed to attack the central problem of this analysis, which is predicting future values of a function. When doing such things, I like to start with the simplest case I can think of (the "Hello, world" of math). Instead of me just giving you the function, let's make a game of it. I'm going to give you a few values for y(x). Your challenge is to tell me what the next value will be, with the help of Table 1.

Table 1: Guess my value

 x y(x) 0 1.23 1 1.23 2 1.23 3 1.23 4 1.23 5 1.23

Not sure? Maybe the graph in Figure 3 will help.

Figure 3: Hello, constant

So what's the predicted value at x = 6? Clearly, the most likely value is (duh!) 1.23.

As silly as this example might seem, there's an important lesson to be learned. In the absence of any other information, the best guess value for the next point is the value the function had last. Simple. Let's put this concept into a mathematical form. Write:

 (6)

Since no one's given us a limit on the range of x, we might as well assume it can take on any value. In this case, it doesn't matter anyway; Equation 6 is a statement that y doesn't really depend on x at all.

OK, what's the next step? How about a straight line? Table 2 and Figure 4 break it down for you.

Table 2: A straight line

 x y(x) 0 1.23 1 2.69 2 4.15 3 5.61 4 7.07 5 8.53

Figure 4: Hello, line

From your college algebra, you might remember the concept of a slope. We can find the slope of a straight line by picking two points—it doesn't matter which two, since the line is straight—and plugging the values into the formula:

(slope) (7)

For no good reason, I'll use:

x1 = 0   y1 = 1.23

x2 = 5   y2 = 8.53

Equation 7 gives:

(8)

The slope of the curve is 1.46, which means that every time x advances by one unit, y will advance by 1.46 units. From this, we can guess that when x is 6, y will have advanced to:

(9)

More importantly, we can guess what y will be for any new value of x. One form of the equation is based on the y-intercept, which is the value of y when x is zero. The equation is:

 (10)

Plug in x = 6 to get:

 (11)

Hello, curve
I suppose you can guess what's coming next. I'll step up to the next level, which is a quadratic. No, you won't have to solve the quadratic equation. That's required if we're going the other way, where the curve has two y-values for every x. In this case, we're going the "easy way." Table 3 and Figure 5 tell the story.

 x y(x) 0 1.23 1 -1.36 2 -2.71 3 -2.82 4 -1.69 5 0.68

Figure 5: Hello, curve

As we get into higher and higher orders for the function curve, we're going to find that the math gets messy in a hurry. The rules for fitting a quadratic equation are not outrageously hard, but not trivial, either. Since we've done it before in these pages, I won't go into the details. Instead, I'll just outline the general approach.

One aspect of the solution, we could probably have predicted. For the case where the function was constant, we needed only one value: the value of y at any value of x. For the case of a straight line, we needed two points, denoted by {x1, y1} and {x2, y2}. It probably won't surprise you to learn that we're going to need three points to fit the quadratic. As before, any three will do, as long as their x-values are distinct.

We begin with a general form of the quadratic equation:

 (12)

Evaluate this function at three points to get:

 (13)

These equations represent three equations in three unknowns: the coefficients A, B, and C. After a little algebra, we get the algorithm:

 (14)

 (15)

 (16)

And, finally:

 (17)

Let's try this solution for our example function. We can choose any three points in Table 3, though it's better if they're far apart. I'll arbitrarily choose:

 x y 0 1.23 2 -2.71 4 -1.69

Substituting these values into Equation 14 gives:

 (18)

So:

 (19)

Now that we have C, we can solve for B. Equation 16 gives:

 (20)

Finally, from Equation 17, we get:

 (21)

This one is easy, because I chose x1 to be zero. The formula reduces to:

 (22)

This point deserves emphasis. When x = 0, the function is always equal to its y-intercept. Just as with the lower-order cases, this value has special importance. It's important enough to emphasize it by changing our general equation to:

 (23)

We should write the equation this way, even if we don't use zero as one of our x-values. However, if we have to evaluate the function there anyway, we might as well insist on using this value for one of the x's. If the range of the function doesn't include x = 0, we can always make sure it does so with a change of variables.

Now we have all three coefficients, and can write down the fitted function:

 (24)

The next entry in Table 3 should be:

 (25)

From Figure 5, this looks about right.

Before leaving the subject of fitted polynomials, I want to show you a much more elegant way of finding the coefficients. A word of warning, though: it does involve matrix algebra. If you don't feel comfortable with that, you can skip to the next section.

Considering Equation 13 one last time, let us write them in the matrix-vector form:

 (26)

Inverting the matrix gives:

 (27)

Inverting the matrix by hand is not fun, but a math tool like Mathcad (see sidebar) will do it in an eyeblink. If you prefer, Mathcad will even invert the matrix symbolically, so you can write down the solutions for A, B, and C for the general case. Extension of the method to higher orders is obvious.

A slippery slope
Let's review where we've been so far. For the three orders of polynomials, our fitting equations turn out to be:

 (0th order) (28) (1st order) (29) (2nd order) (30)

Now we can begin to see a deeper structure behind the solution. All of the equations have the y-intercept, y(0), as the constant term. In the case of the constant function, it's the only term, since the function value doesn't change with x. In the first-order fit, the next coefficient is the slope, m, and it also doesn't change with x. For the quadratic, the last coefficient is C. As it turns out, this one represents the curvature of the curve, and it doesn't depend on x, either. Try the fit for any three points in Table 3, and you'll see that you always get the same value for C. Only the middle coefficient, B, changes with the choice of x's. If we were to write down the solutions for higher orders, we'd find that this rule continues: the first and last coefficients don't depend on the x's, but the others do.

Let's look a bit harder at that elusive value of B. Using nothing more than dimensional analysis, we can see that it always has the character of a slope. That is, regardless of what physical parameters x and y represent, the units of B in Equation 21 must be the same as those of m in the linear case, or else the units of y won't come out right. The units of m are:

The units of B must be the same, since Equation 16 contains m1. And, indeed, the units of the next term agree, as they must:

But what, exactly, is this slope called B? Clearly, the slope of the curve changes with x; that's why it's called a curve. But of all the slopes in Figure 3, which one is B?

To answer that question, we're going to have to turn to calculus. Don't panic, however. We don't need much calculus; just the rule for differentiating a power of x:

 (31)

The derivatives of the first few orders are:

 (32)

Let's agree that the slope of the function, at any x, is m(x), which is now not a constant. Using these rules of differentiation, the derivative of the quadratic function is:

 (33)

or

(34)

What is this slope when x = 0? Clearly, it's:

 (35)

Now we see what the value of B represents: It's the slope when x = 0. A slope-intercept, if you will. Now let's go back to Equation 30, which we can write:

 (36)

Looking at this equation, it's tempting to assume that C must represent the "y-intercept" of curvature. That is, it's the second derivative. But that would be wrong, as we can see by differentiating Equation 34 one more time:

 (37)

So C isn't exactly the second derivative; it's only proportional to it. Therefore we must write, for the quadratic:

 (38)

To extend the formula to higher orders, note that each time we differentiate higher orders, we get some coefficients out front. For example:

(39)

Notice anything familiar about that value, 24? Yep, it's the factorial, 4!. And from the steps of the derivations, it's easy to see how this product gets there. Each time we differentiate, we get a coefficient that's one less than the previous one.

Once you see that point, it's not hard to believe that the nth derivative of the function relates to its "y-intercept" by a ratio that's equal to a factorial. This inspires us to guess, for the next order,

 (40)

Yes, I know, I know. Making an assumption is not the same as giving a proof. But hey, this isn't a math textbook, and I don't need to give you a proof. I only need to convince you that the form is correct.

Before moving on, I want to address the notation in Equation 40. It's starting to get more than a little awkward. What's more, it's hard to even understand what the notation means. Consider the term:

What does the notation even mean? It clearly doesn't mean the derivative of y(0) because y(0) is the y-intercept, and it's a constant. Instead, the notation is intended to say:

• First, take the second derivative of y(x)
• Then, evaluate it at x = 0.
This concept is important enough that it deserves a new notation. Let us write:

to mean the derivative of y(x), evaluated at x = 0. We can then write Equation 40 in the form:

 (41)

In the next column, we'll look at other ways to simplify the notation even further.

Taylor's insight
Equation 41 is getting very close to the Taylor series. We originally derived it for the case of a quadratic function, but then we extended the result to the cubic polynomial. It doesn't take much of a stretch to believe that we can continue to add terms, to higher and higher orders of polynomials.

(42)

This equation looks like an infinite series, and indeed it may be. But bear in mind that it's not infinite if the original function is a polynomial. In that case, the order of the derivative is reduced at each step, and we will eventually reach zero. In other words, Equation 42 naturally truncates itself for polynomials.

Taylor's insight, however, goes much deeper. He conjectured that the same rule applies to all functions, polynomial or not. Equation 42 is the Taylor series.

I need to emphasize that the Taylor series is not, repeat not, an approximation. It's exact. If the series truncates because all derivatives vanish above a certain order, well and good. Then you've got an equation like Equation 41, which is exact. If the series does not truncate, Equation 42 still holds, but now it takes an infinite number of terms to represent it.

Taylor's conjecture, however, was that Equation 42 works for all functions. He said that, if we know the value of a function at some value of x (zero works for me), and also all of its derivatives at the same value, we can predict the value for all x. That's all x, as in, the entire range over which the function is valid.

The mathematicians among you will have noticed, of course, the flaw in this reasoning. There are lots of functions for which the derivatives can't be evaluated beyond a certain order. They don't exist, because of some kind of discontinuity, not in the function itself, but in its derivative. The simplest example I can think of is the triangle-wave function shown in Figure 6a. Its slope changes abruptly at the corners, and so its second derivative is undefined at those points.

Figure 6: A triangular wave

Sometimes the discontinuity is not all that obvious. By integrating the function of Figure 6a, you will get the curve of Figure 6b, which looks very much like a sine wave. The discontinuity is hidden by the smoothness of the curve; we only find its true nature when we evaluate the derivatives.

Based on these considerations, we have to modify our all-inclusive statement. As it turns out, the Taylor series doesn't really work for all functions. It works for all functions that are continuous and have continuous derivatives, all the way out to infinity. That's the restriction. It may seem to be a very limiting one, but in practice, we can usually find a way to meet it. If nothing else, we can treat the functions and their derivatives as piecewise continuous, ignoring the behavior at the trouble spots.

Remember, however, that the number of derivatives we need is infinite. In practice, of course, we can't generate an infinite number of derivatives. What we do, instead, is one of two things. We either:

1. Look for some kind of trick to let us express higher-order terms in terms of lower-order ones, or
2. Cut the process off at some point and call it good enough.
In other words, the approximation comes only when we try to express an infinite series using a finite number of terms. That clearly won't give us an exact fit, but it comes closer and closer as we add terms.

A last example
I'll close this column by looking at one more polynomial, which is fourth order. Let:

 (43)

Differentiating this function a bunch of times yields:

 (44)

Evaluating all these derivatives at x = 0 gives:

 (45)

Plugging these values into Equation 42 gives:

 (46)

This, of course, agrees with Equation 43, which it had danged well better. It may seem that this has been sort of a frivolous exercise. We started out with Equation 43 and jumped through some calculus hoops, only to arrive back at Equation 43. But that's sort of the point, isn't it? The fact that we got the same series back means that our use of the Taylor series in Equation 42 was correct. More than that, it's a convincing demonstration, if not a rigorous proof, that our extension of Equation 40 was justified.

Next time
The purpose of this little exercise has been to not only convince you that the Taylor series is valid, but to show you how to use it. In the next column, we'll look at different, less verbose, notations for the series. One of those notations leads directly to the Rosetta equation.

See you then.

Jack Crenshaw is a senior software engineer at Spectrum-Astro and the author of Math Toolkit for Real-Time Programming, from CMP Books. He holds a PhD in physics from Auburn University. E-mail him at jcrens@earthlink.net.

Old-time readers may recall how I sang the praises of the Mathsoft program, Mathcad. "If you don't have Mathcad," I warned, "you're limiting your career." Then along came version 6.0, which was buggy. Version 7.0 was worse, and 8.0, worse yet. I warned, "Forget what I said. If you have an old version of Mathcad, hang onto it. Otherwise, avoid it; it's hopelessly broken." And it was. And I told Mathsoft that it was, so often that they must have cringed every time they saw my name on an e-mail.

More recently, a funny thing has happened. No, it's not that they fixed all the bugs and responded to all my criticisms. True, they fixed some things, but they broke others. What's relatively new is, I can't stop using it. Even though I still curse Mathcad loudly and frequently, I continue to use it every day. Why? Plainly and simply, it can do things no other tool can do—things I need to have done. At my day job, I have access to two other math-related programs, both far more expensive than Mathcad. But neither one lets me do the things I can do in Mathcad when it's being cooperative.

Nowadays, my relationship with Mathcad isn't all that much different from that with Microsoft Office or Windows itself. Hey, I curse them too, and sometimes they crash or lose my data. But when they work, the results are beautiful to behold. Mathcad's like that.

Ten years ago, when I did analyses, I used notebook paper and pencils. I wrote voluminous notes, complete with derivations of equations, tables of results, and graphs in four colors. Today, I do the same things but much faster in Mathcad. It's become virtually my only tool for documenting what I've done. A few months ago, I started using Mathcad to test my equations as I go along, and generate hand checks for software as well. On a good day, I can develop and validate math-related software almost as fast as I can type. My productivity is up by a factor of four. I suppose this technique could be done using any tool, but Mathcad remains my tool of choice. Even with its occasional misbehavior, I'm still way ahead of the game.

Thank you for the clearest explanation I have ever read on the Taylor series. If only my math teachers had been so patient.

Thanks again,
- Shlomo Kut
vyyo

• 11.06.2013

• 03.15.2001

• 09.05.2013

• 12.04.2013