# Efficient Fixed-Point Implementation of the Goertzel Algorithm on a Blackfin DSP

**This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.**

In this article, we will briefly discuss the Goertzel algorithm flow and its fixed-point implementation on the Analog Devices Blackfin BF5xx processor family using the Blackfin's special arithmetic modes. In particular, we will discuss an efficient implementation of 16.16 fixed-point multiplications on the 16-bit MAC-friendly Blackfin DSP processor.

**Goertzel Algorithm**

The Goertzel algorithm is widely used for the detection of a few frequencies in a given signal input.

Though an N-point Fast Fourier Transform (FFT) algorithm efficiently computes N Discrete Fourier Transform (DFT) coefficients given the N input samples, some applications, such as dual tone multi-frequency (DTMF) don't require all the DFT coefficients. In such cases, the Goertzel algorithm can be used to compute the required DFT coefficients (or frequencies) of the input signal x[n] efficiently using a second order linear filter.

In **[Pro1992]** , a recursive expression for computing k^{th} DFT coefficient is derived from the standard DFT equation as:

Now, the Z domain transfer function governing the above difference equation can be expressed as:

The final expressions for implementation of the Goertzel algorithm are obtained from the above system functions as below:

From Equation (1),

From Equation (4), we are only interested in the infinite impulse response (IIR) filter output to compute the DFT coefficient. Thus, we need not compute the finite impulse response (FIR) filter Y_{k} [z]/S_{k} [z] output for each input sample s_{k} .

The magnitude square value of k^{th} DFT coefficient can be obtained from Equation (4) as:

The computation s_{k} for using Equation (3) requires one multiply and two add operations per sample. In applications like DTMF detection, we are only concerned with the magnitude square of the k_{th} DFT coefficient, i.e. |X[k]^{2} , as given in Equation (5).

Unlike DFT, which requires storing many twiddle factors, we need not store any coefficients in the Goertzel algorithm. We use only one coefficient (*q = 2cos(2Ï€k/N) * ) in the computation and this can be held using a single data register.

**Fixed-Point Implementation**

For k^{th} tone detection, we compute the magnitude square of k^{th} DFT coefficient using the Goertzel algorithm and detect the tone by applying appropriate threshold on the magnitude square value. In this implementation, we will use Equations (3) and (5).

In this application, the values of k and N are available in advance. Thus, we can pre-compute the value *q = 2cos(2Ï€k/N)* in advance. The actual implementation of the algorithm is simple and it involves only multiply and add, and update operations as described in the pseudocode shown below in **Figure 1 below** .

Figure 1: Pseudocode for Goertzel algorithm |

The value of s_{k} [n] grows in the recursive computation in Equation (3) when the signal contains the k<> frequency component. Thus, we cannot use 1.15 or 1.31 fixed-point format for s_{k} [n].

The final amplitude of s_{k} [n] depends on the block size N. If we prefer [n].[16-n] format to efficiently implement multiplication on the Blackfin processor using the multiply accumulate (MAC) unit, then we must scale down in the middle of an iterative procedure to avoid register overflow which will cause a loss of output accuracy.

To best demonstrate a simple and efficient Blackfin implementation, we will use the 16.16 native format for all coefficients, intermediate variables, and input samples.

When performing the multiplication of two fixed-point numbers in [a].[b] and [c].[d] format, the output is in [a+c].[b+d] fixed-point format. This means, when two 16.16 numbers are multiplied, the output is in 32.32 format.

Here we must normalize the multiplication output back to the 16.16 format to perform additional arithmetic operations. This can be easily achieved by leveraging Blackfin's MAC special arithmetic modes **[Abf2005]** .

The high performance Blackfin DSP processor can efficiently perform two 32-bit number (in 16.16 format) multiplication (with Blackfin MAC units using the mixed mode option) by consuming 4 clock cycles as shown in **Figure 2 below** . We perform integer quantities multiplication (*i.e. a*c in 16.0 x16.0 format* ) using an integer signed option and then extract the resulting lower 16-bits into r5.h.

Figure 2: Efficient implementation of 16.16 multiplication on the Blackfin DSP |

We then perform fractional quantities multiplication (*i.e. b*d in 0.16×0.16 format* ) using fractional unsigned option and we extract the resulting higher 16-bits into r5.l so that r5 is 16.16 format, and then perform the MAC operation (*i.e. a*d + c*b in 16.0 x 0.16 format* ) using mixed mode option and extracted to r3 (*also in 16.16 format* ).

Now, summing the quantities in r3 and r5 result in normalized output in 16.16 format for the multiplication of two 16.16 numbers.

The Blackfin assembly code for IIR filter iterative loop in Goertzel algorithm is shown in **Figure 3 below** . In Figure 3, we are spending two cycles for updating the intermediate values using simple move operations.

Figure 3: Assembly code for Goertzel algorithm IIR filter loop |

These move operations can be avoided by unrolling the loop twice and directly placing the values in the next computational expression as in the pseudocode shown in **Figure 4 below** .

Figure 4: Pseudocode showing the efficient flow of Goertzel algorithm |

The corresponding assembly code is shown in **Figure 5 below** . This effectively requires six clock cycles to perform a single iteration of an IIR filter loop.

Next, the magnitude square value is computed (*see Figure 1* ) using four multiplications and two additions which require about 18 clock cycles. Once the frequency component magnitude square value is determined then detecting the frequency component presence is achieved by simply applying an appropriate threshold value.

Figure 5: Unrolling the loop to efficiently implement the Goertzel algorithm |

**Conclusion **

With the Blackfin BF5xx processors, about 6 cycles (of which 4 cycles are spent for multiplication of two 16.16 format numbers) are consumed per iteration in the IIR filter loop of the Goertzel algorithm and about 20 cycles for the FIR filter operation. If N = 200, then we require about 1220 cycles to detect one frequency tone in the given signal using Goertzel algorithm.

**Hazarathaiah Malepati** joined Analog Devices in 2003, where he is currently working on embedded algorithm software development for the Blackfin family of processors. From 2000 to 2003, he worked as a research engineer in HIRP (HFCL-IISc research program), Bangalore, India. He has a masters in industrial electronics from KREC, Surathkal.

**Yosi Stein** serves as DSP principal system architect/advanced technologies manager, working in the Digital Media Technology Center on the development of broadband communication and image compression enhanced instruction set for Analog Devices' fixed-point DSP family. Yosi holds a B.S.c in electrical engineering from Technion–Israel Institute of Technology.

* *

**References** **[Pro1992]** John G Proakis and Dimitris G Manolakis, “Digial Signal Processing: Principles, Algorithms and Applications”, Second Edition, Macmillan Publishing Company, 1992.

**[Abf2005]** Analog Devices Inc, “ADSP-BF53x/BF56x Blackfin Processor Programming Reference”, Rev 1.0, Jun 2005.

Dears,

My name is NimoTsai. I am a master student

of EE in NSYSU, Taiwan, Asia.

Could you help me solve the problem: Goertzel algorithm: amplitude v.s. magnitude!

I use the Goertzel filter to catch 2nd harmonic current written in

DSP28335. And all the