Efficient Fixed-Point Implementation of the Goertzel Algorithm on a Blackfin DSP
Fixed-Point ImplementationFor kth tone detection, we compute the magnitude square of kth DFT coefficient using the Goertzel algorithm and detect the tone by applying appropriate threshold on the magnitude square value. In this implementation, we will use Equations (3) and (5).
In this application, the values of k and N are available in advance. Thus, we can pre-compute the value q = 2cos(2πk/N) in advance. The actual implementation of the algorithm is simple and it involves only multiply and add, and update operations as described in the pseudocode shown below in Figure 1 below.
![]() |
| Figure 1: Pseudocode for Goertzel algorithm |
The value of sk[n] grows in the recursive computation in Equation (3) when the signal contains the k
The final amplitude of sk[n] depends on the block size N. If we prefer [n].[16-n] format to efficiently implement multiplication on the Blackfin processor using the multiply accumulate (MAC) unit, then we must scale down in the middle of an iterative procedure to avoid register overflow which will cause a loss of output accuracy.
To best demonstrate a simple and efficient Blackfin implementation, we will use the 16.16 native format for all coefficients, intermediate variables, and input samples.
When performing the multiplication of two fixed-point numbers in [a].[b] and [c].[d] format, the output is in [a+c].[b+d] fixed-point format. This means, when two 16.16 numbers are multiplied, the output is in 32.32 format.
Here we must normalize the multiplication output back to the 16.16 format to perform additional arithmetic operations. This can be easily achieved by leveraging Blackfin's MAC special arithmetic modes [Abf2005].
The high performance Blackfin DSP processor can efficiently perform two 32-bit number (in 16.16 format) multiplication (with Blackfin MAC units using the mixed mode option) by consuming 4 clock cycles as shown in Figure 2 below. We perform integer quantities multiplication (i.e. a*c in 16.0 x16.0 format) using an integer signed option and then extract the resulting lower 16-bits into r5.h.
We then perform fractional quantities multiplication (i.e. b*d in 0.16x0.16 format) using fractional unsigned option and we extract the resulting higher 16-bits into r5.l so that r5 is 16.16 format, and then perform the MAC operation (i.e. a*d + c*b in 16.0 x 0.16 format) using mixed mode option and extracted to r3 (also in 16.16 format).
Now, summing the quantities in r3 and r5 result in normalized output in 16.16 format for the multiplication of two 16.16 numbers.
The Blackfin assembly code for IIR filter iterative loop in Goertzel algorithm is shown in Figure 3 below. In Figure 3, we are spending two cycles for updating the intermediate values using simple move operations.
These move operations can be avoided by unrolling the loop twice and directly placing the values in the next computational expression as in the pseudocode shown in Figure 4 below.
The corresponding assembly code is shown in Figure 5 below. This effectively requires six clock cycles to perform a single iteration of an IIR filter loop.
Next, the magnitude square value is computed (see Figure 1) using four multiplications and two additions which require about 18 clock cycles. Once the frequency component magnitude square value is determined then detecting the frequency component presence is achieved by simply applying an appropriate threshold value.
Conclusion
Hazarathaiah Malepati joined Analog Devices in 2003, where he is currently working on embedded algorithm software development for the Blackfin family of processors. From 2000 to 2003, he worked as a research engineer in HIRP (HFCL-IISc research program), Bangalore, India. He has a masters in industrial electronics from KREC, Surathkal.
Yosi Stein serves as DSP principal system architect/advanced technologies manager, working in the Digital Media Technology Center on the development of broadband communication and image compression enhanced instruction set for Analog Devices' fixed-point DSP family. Yosi holds a B.S.c in electrical engineering from Technion--Israel Institute of Technology.
References
[Abf2005] Analog Devices Inc, "ADSP-BF53x/BF56x Blackfin Processor Programming Reference", Rev 1.0, Jun 2005.

Figure 2: Efficient implementation of 16.16 multiplication on the Blackfin DSP

Figure 3: Assembly code for Goertzel algorithm IIR filter loop

Figure 4: Pseudocode showing the efficient flow of Goertzel algorithm

Figure 5: Unrolling the loop to efficiently implement the Goertzel algorithm
With the Blackfin BF5xx processors, about 6 cycles (of which 4 cycles are spent for multiplication of two 16.16 format numbers) are consumed per iteration in the IIR filter loop of the Goertzel algorithm and about 20 cycles for the FIR filter operation. If N = 200, then we require about 1220 cycles to detect one frequency tone in the given signal using Goertzel algorithm.
[Pro1992] John G Proakis and Dimitris G Manolakis, "Digial Signal Processing: Principles, Algorithms and Applications", Second Edition, Macmillan Publishing Company, 1992.



Loading comments... Write a comment