Doing Hartley Smartly
To read original PDF of the print article, click here. Doing Hartley Smartly
Robert Scott


Think the fast Fourier transform is fast enough?Think again. Meet the fast Hartley transform.
Transforming a series of samples taken in the time domain into a frequency domain representation is useful in a wide variety of signal processing applications. This includes direct analysis of the frequency spectrum as well as operations like convolutions, which are used in digital filtering. The usual tool for performing this transformation is the Fourier transform, or to be more specific, the fast Fourier transform (FFT). But for most applications, there is an even faster method called the Hartley transform. This article reviews some published data on the fast Hartley transform (FHT) and then develops an optimized implementation of this algorithm. A typical FFT application starts with a series of samples taken in the time domain. For example, suppose you want to determine the vibration characteristics of an automobile engine. You could attach an accelerometer to the engine block and sample the output of the accelerometer at regular intervals. Suppose you take 4,096 samples over a onesecond period. These 4,096 samples would then be transformed, using an FFT, into an array of 4,096 elements where each element specifies the vibrational energy at a particular frequency. The frequencies represented in the FFT array run from 0Hz to 4,095Hz in steps of 1Hz. So if you needed to determine the total energy in the frequency band from 200Hz to 300Hz, you could add up the amplitudes of FFT entries 200 through 300. Although this example is described using the FFT, the FHT works just as well. When I first saw an article about the Hartley transform, I must admit that I didn't give it serious consideration. The formulae looked similar to the ones for the Fourier transform, except that the imaginary, i, was missing. I thought there must be some catch. But then, years later, I ran across that article again and decided to delve into it to see what the story was. I was so impressed with what I learned that I immediately wrote a little freeware utility for general aviation enthusiastsa PC program to measure the RPM of an airplane engine based on the sound of that engine. The program (available at www.wwnet.net/~rscott/rpmsound.html) uses the Hartley transform to find the peaks in the frequency spectrum. The pattern and location of the frequency peaks determine the RPM. Hartley transform defined
If, for some reason, you want to undo this transform, the formula is almost identical, except for the absence of the 1/N factor: The first thing that made me suspicious about these formulae was that the sum of the cosine and the sine of an angle is just another phaseshifted sine function (shifted by 45 degrees, to be exact). What could be so magical about correlating against a sine function that was shifted by an arbitrary amount? Well, the answer is “nothing.” In fact, it is possible to define a whole class of reversible transforms based on other phaseshifted sine functions. But the specific sine function used in the Hartley transform exhibits a certain symmetry between the transform and its inverse. And when it comes to implementing the transform, we will see that we end up taking pure sines and cosines anyway and not their sums. Comparisons with the FFT
It turns out that using the Hartley transform to compute the Fourier transform is faster than computing the Fourier transform directly. This is an important use of the Hartley transform because the Fourier transform has immediate natural interpretation while, as far as I know, the Hartley transform does not. Each complex value in the Fourier transform represents the amplitude and phase of a particular frequency component in the frequency spectrum of a time domain sequence. Often only the amplitude is interesting, so applications will then form the array of magnitudes of the complexvalued Fourier series. The implementation of the Hartley transform described here provides for a direct calculation of this array of amplitudes. Note that Equation 2 leads to outofrange indices when f = 0, since there is no H(N). The interpretation of the indices must, therefore, be extended moduloN. Therefore, H(N) takes the value H(0). And, in fact, this moduloN interpretation of the indices is supported by the formulae that define the Hartley transform, which give the same values if you add N to the indices. If you have ever used the Fourier transform to process realvalued data, you may have been aware of a sense of wasted effort. To begin with, you probably stuffed your realvalued sequence into the real parts of an array of complex numbers and zeroed out the imaginary parts. Having all those zeroes using up space but providing no information always bothered me. Then, when the Fourier transform was done, the resulting array of complex numbers had a lot of redundancy. In particular, you never had to look past the midpoint of the array (the socalled Nyquist frequency) because the frequency domain data in the second half of the array was a kind of mirror image of the first half. The complexvalued F(N–f) is just the complex conjugate of F(f). (This is only true for the Fourier transform of realvalued sequences. In general, the Fourier transform has no such redundancy.) When I saw how the power of the Fourier transform was being “wasted” on realvalued sequences, I finally understood how something like the Hartley transform could be more efficient. Butterflies Figure 1: A single FHT butterfly The heart of the algorithm is a computation whose dataflow diagram looks like a butterfly. But unlike the symmetric butterfly of the FFT, the FHT butterfly, as shown in Figure 1, is a bit ugly. It has a vestigial third wing. From now on we will be assuming that N is a power of two. Since N is even, the time sequence, X(t), which appears in the definition of the Hartley transform (Equation 1), can be divided into two intertwined sequences, X0 and X1, given by:
That is, sequence X0 consists of the evenindexed values from sequence X and sequence X1 consists of the oddindexed values. As before, we will consider all indices to be interpreted moduloN. But since X0 and X1 are defined in terms of 2t, they actually repeat modulo N/2. So consider X0 and X1 as sequences of length N/2 and let H0 and H1 be their Hartley transforms. Let M stand for N/2. Then, by the definition of the Hartley transform in Equation 1 we have:
I leave it as an exercise for the reader to verify that: (If you are reading the Byte article, note that this equation is incorrectly shown with Ns instead of N.) It took me about half an hour to verify this equation with paper and pencil, using only the trig identities:
If you want to develop an appreciation of the basis of the Hartley transform, I recommend performing the exercise.Keeping in mind that the Hartley transforms on the right side of Equation 3 repeat modulo N/2, we can recognize the following symmetry: Equations 3 and 4 taken together define the complete FHT butterfly computation, as diagrammed in Figure 1. These two equations express the Hartley transform of length N in terms of two smaller (halflength) Hartley transforms. This is the beginning of a recursive process. At each step in the recursion we recognize that all of the Hartley transforms from the previous step can be further broken down into Hartley transforms of half the length. The process continues until we have only Hartley transforms of length 1, which can be trivially computed from Equation 1. Figure 2: FHT data flow At this point I will abandon mathematical rigor and proceed by example alone. Suppose N = 16. Figure 2 shows a dataflow diagram that illustrates the recursive decomposition of a Hartley transform of length 16. Although the dataflow diagram is computed from left to right, we shall analyze it from right to left. On the far right, the numbers 0 to 15 represent the values of H(0) through H(15), which is the Hartley transform of the original sequence X(0) through X(15). If you let f run from 0 to 7 in Equations 3 and 4, you can see how the 16 values of H can be computed from the two smaller Hartley transforms, H0 and H1, each of length 8. The rightmost column of dataflow lines in Figure 2 illustrates this process as the overlay of eight instances of the butterfly diagram from Figure 1. The lines have been omitted where the corresponding trig factor is zero. In the column of numbers to the left of the final butterfly diagrams, the two sets of indices, 0 through 7, represent the value of H0 and H1 with H0 on top. For example, the first three instances of Equations 3 and 4 in this column of butterflies is:
Note that the indices of H0 and H1 have been reduced modulo 8, since they are Hartley transforms of length 8. The decomposition then continues with the next column to the left, showing how H0 and H1 are each represented in terms of Hartley transforms of length 4. H0 is decomposed into H00 and H01, while H1 is decomposed into H10 and H11. The numbers 0 through 3 are indices into these smaller Hartley transforms. The recursion process continues all the way back to the first column, where we would have 16 Hartley transforms of length 1. Equation 1 shows that a Hartley transform of a time sequence of length 1 is just the original time sequence itself. So the first column represents values from the original time sequence X(t). The numbers in that column are the indices into the original time sequence, X(t). But how did they get into that strange order? To answer this question we must determine exactly which time sequence is transformed into each of the Hartley transforms in Figure 2. Starting again at the right, we know that the full Hartley transform is derived from the original time sequence X(0) through X(15) in normal order. Going back one step in the recursion and recalling how the smaller Hartley transforms in Equations 3 and 4 were defined, we see that H1 is the Hartley transform of the sequence X(0), X(2), …, X(14) and H1 is the Hartley transform of the sequence X(1), X(3), … , X(15). Purely as a computational convenience, we place the time sequence for H1 after the time sequence for H0. We have effectively sorted the sequence, X, according to the leastsignificant bit of its indices.In going from Hartley transforms of length 8 to Hartley transforms of length 4, we further refine the sort on X based on the nextleastsignificant bit of its indices. That is: H00 is based on {X(0), X(4), X(8), X(12)}H01 is based on {X(2), X(6), X(10), X(14)}H10 is based on {X(1), X(5), X(9), X(13)}H11 is based on {X(3), X(7), X(11), X(15)} At each stage in this progressive sort, the results of the previous sort are preserved. Therefore when we get back to the beginning of the dataflow diagram, we have the original sequence, X, listed in an order where the significance of the bits in the indices has been reversed. This is called bitreversed order, and it is just as applicable to the FFT as it is to the FHT. Thus, to perform the FHT, we must first reorder the sequence, X, into bitreversed order. Then we perform the butterfly computations as they are diagramed in Figure 2. Implementation When we calculate the FHT, we must have two arrays. Each round of calculations uses one array for input (the FromArray in the code) and the other array for output (the ToArray in the code). Which array the final transform ends up in depends on whether you had to do an even or an odd number of butterfly calculation rounds. In any case, this inability to do a transform in place contradicts the opening claim of O'Neill's article that only half the memory is needed for the FHT. It is true that the FHT uses only realvalued arrays instead of the complexvalued arrays needed by the FFT, and so the arrays use half as much memory. But you need two of those arrays with the FHT and only one array with the FFT, therefore the memory usage is a wash. In general, each round of calculations requires N/2 evaluations of Equations 3 and 4. Each such evaluation involves a sine, a cosine, two multiplications, and two additions. To make things quick, the trig values can be in precomputed lookup tables. Additionally, we recognize that in each butterfly calculation, the terms in Equation 3 are the same as the terms in Equation 4, except for a difference in signs. Referring to the example of N=16, this allows us to calculate terms H(0) through H(7) and get H(8) through H(15) almost for free. This also means that we only need values of sine and cosine for angles from zero to just under 180 degrees. In fact, I choose to implement the smallest possible trig lookup table, namely, the values of sin(x) in the first quadrant (zero to 90 degrees). The values of sine in the second quadrant as well as the values of cosine in both quadrants will come from this table using the trig identities:
Thus to perform a Hartley transform of length N, we only need to precompute the N/4 values of sin(x) in the first quadrant. In order to simplify accessing this table, I have separate code for the first and second quadrants. The third and fourth quadrants are done as a byproduct using the symmetry noted in Equation 4. Let's jump right to the heart of the implementation. The innermost loop is essentially: while(in first quadrant){ From++; To++; FromRetrograde; SinPtr += ButterflyBlocks; CosPtr = ButterflyBlocks; CSTerm = *CosPtr * From[ButterflyWidth] + *SinPtr * FromRetrograde[0]; To[0] = From[0] + CSTerm; To[ButterflyWidth] = From[0]  CSTerm;}
This has been extracted and simplified from Listing 1. The actual implementation of Equation 3 is in the calculation for To[0]. The calculation for To[ButterflyWidth] shows the use of the symmetry of Equation 4. As you can see, CSTerm only needs to be calculated once for each pair of Hartley transform values. Both SinPtr and CosPtr point into the precomputed sine lookup table, but they run in opposite directions to give the sine and cosine. The values for ButterflyWidth and ButterflyBlocks change for each round. ButterflyWidth is the index offset between the two outputs of the butterfly. It starts at four for round three and doubles in each round. ButterflyBlocks is the number of Hartley transforms on the output side of each round. It is cut in half after each round, ending up at one for the last round. I should point out here that the code actually computes double the Hartley transform at each row because we ignore the factor of 1/2 in Equations 3 and 4. This makes our final result too large by a factor of N. In many applications this error is irrelevant since only relative values are used. But if correct scaling is required, a separate scaling function, ScaleHartleyTransform(), is provided in the code. Without calling this function you actually get the reverse Hartley transform. You will notice that two of the three terms in Equation 3 involve indices that increment as the target index increments. But the third term in that equation uses a backwardsrunning index. This has been called the “retrograde index” in the literature. It is what I call the ugly third wing of the FHT butterfly. It is represented by FromRetrograde in the code. Bit reversal and special cases The first two rounds of the FHT are so simple that I do them all at once as a special case. They involve only degenerate trig values (0 or 1). Also, the first computation of every set of butterflies is special because it has only two terms and no trig lookups. I have, therefore, structured the code to compute the butterflies for zero degrees and 90 degrees as special cases. Interface detailsIn order to use the FHT code, the calling program must first generate a trig table of the appropriate size. The function InitTrigTable() is provided for just that purpose. Thereafter the client code may call FastHartleyTransform() by providing the appropriate parameters. Notice that the client code is responsible for allocating all the arrays, and that the resulting transform may be in either the original input array or the work array. Also note that no static variables are used, so this implementation is reentrant. The return value of FastHartleyTransform() is a pointer to the correct result array. As noted earlier, unlike the Fourier transform, the Hartley transform has no natural interpretation as a frequency spectrum. The most natural use for the Hartley transform is as a means to develop the Fourier transform, as shown by Equation 2. The magnitude of the first half of the Fourier transform is calculated by the PowerSpectrum() function. Performance The implementation in Listing 1 is meant as a real
