Efficient Fixed-Point Implementation of the Goertzel Algorithm on a Blackfin DSP

Hazarathaiah Malepati and Yosi Stein

January 18, 2010

Hazarathaiah Malepati and Yosi SteinJanuary 18, 2010

Fixed-Point Implementation
For kth tone detection, we compute the magnitude square of kth DFT coefficient using the Goertzel algorithm and detect the tone by applying appropriate threshold on the magnitude square value. In this implementation, we will use Equations (3) and (5).

In this application, the values of k and N are available in advance. Thus, we can pre-compute the value q = 2cos(2πk/N) in advance. The actual implementation of the algorithm is simple and it involves only multiply and add, and update operations as described in the pseudocode shown below in Figure 1 below.

Figure 1: Pseudocode for Goertzel algorithm

The value of sk[n] grows in the recursive computation in Equation (3) when the signal contains the k<> frequency component. Thus, we cannot use 1.15 or 1.31 fixed-point format for sk[n].

The final amplitude of sk[n] depends on the block size N. If we prefer [n].[16-n] format to efficiently implement multiplication on the Blackfin processor using the multiply accumulate (MAC) unit, then we must scale down in the middle of an iterative procedure to avoid register overflow which will cause a loss of output accuracy.

To best demonstrate a simple and efficient Blackfin implementation, we will use the 16.16 native format for all coefficients, intermediate variables, and input samples.

When performing the multiplication of two fixed-point numbers in [a].[b] and [c].[d] format, the output is in [a+c].[b+d] fixed-point format. This means, when two 16.16 numbers are multiplied, the output is in 32.32 format.

Here we must normalize the multiplication output back to the 16.16 format to perform additional arithmetic operations. This can be easily achieved by leveraging Blackfin's MAC special arithmetic modes [Abf2005].

The high performance Blackfin DSP processor can efficiently perform two 32-bit number (in 16.16 format) multiplication (with Blackfin MAC units using the mixed mode option) by consuming 4 clock cycles as shown in Figure 2 below. We perform integer quantities multiplication (i.e. a*c in 16.0 x16.0 format) using an integer signed option and then extract the resulting lower 16-bits into r5.h.

Figure 2: Efficient implementation of 16.16 multiplication on the Blackfin DSP

We then perform fractional quantities multiplication (i.e. b*d in 0.16x0.16 format) using fractional unsigned option and we extract the resulting higher 16-bits into r5.l so that r5 is 16.16 format, and then perform the MAC operation (i.e. a*d + c*b in 16.0 x 0.16 format) using mixed mode option and extracted to r3 (also in 16.16 format).

Now, summing the quantities in r3 and r5 result in normalized output in 16.16 format for the multiplication of two 16.16 numbers.

The Blackfin assembly code for IIR filter iterative loop in Goertzel algorithm is shown in Figure 3 below. In Figure 3, we are spending two cycles for updating the intermediate values using simple move operations.

Figure 3: Assembly code for Goertzel algorithm IIR filter loop

These move operations can be avoided by unrolling the loop twice and directly placing the values in the next computational expression as in the pseudocode shown in Figure 4 below.

Figure 4: Pseudocode showing the efficient flow of Goertzel algorithm

The corresponding assembly code is shown in Figure 5 below. This effectively requires six clock cycles to perform a single iteration of an IIR filter loop.

Next, the magnitude square value is computed (see Figure 1) using four multiplications and two additions which require about 18 clock cycles. Once the frequency component magnitude square value is determined then detecting the frequency component presence is achieved by simply applying an appropriate threshold value.

Figure 5: Unrolling the loop to efficiently implement the Goertzel algorithm

With the Blackfin BF5xx processors, about 6 cycles (of which 4 cycles are spent for multiplication of two 16.16 format numbers) are consumed per iteration in the IIR filter loop of the Goertzel algorithm and about 20 cycles for the FIR filter operation. If N = 200, then we require about 1220 cycles to detect one frequency tone in the given signal using Goertzel algorithm.

Hazarathaiah Malepati joined Analog Devices in 2003, where he is currently working on embedded algorithm software development for the Blackfin family of processors. From 2000 to 2003, he worked as a research engineer in HIRP (HFCL-IISc research program), Bangalore, India. He has a masters in industrial electronics from KREC, Surathkal.

Yosi Stein serves as DSP principal system architect/advanced technologies manager, working in the Digital Media Technology Center on the development of broadband communication and image compression enhanced instruction set for Analog Devices' fixed-point DSP family. Yosi holds a B.S.c in electrical engineering from Technion--Israel Institute of Technology.

[Pro1992] John G Proakis and Dimitris G Manolakis, "Digial Signal Processing: Principles, Algorithms and Applications", Second Edition, Macmillan Publishing Company, 1992.

[Abf2005] Analog Devices Inc, "ADSP-BF53x/BF56x Blackfin Processor Programming Reference", Rev 1.0, Jun 2005.

< Previous
Page 2 of 2
Next >

Loading comments...