# Working with floating point parameters in an integer world

April 4, 2008

We live in an analog world (Do we really? I'll let you ponder that question on your own time!). But the most accurate means to correctly represent the continuous analog world in software is to use floating point values.

That's great if you have a budget that can support having an external floating point processor or your software processing time requirements are flexible enough that you can implement the floating point arithmetic in software. For the rest of us we have to shave off some of the precision of the floating point value and stuff it into a scaled fix point data type.

There are several ways for handling floating point parameters within fixed point processors. One method is to use define them as a macro with the scaling embedded into the definition.

#define  AccelDueToGravity  (((9.80665 * Scaler) +0.5))

The value for Scaler is typically a power of 2 since integer processors like powers of 2 arithmetic. The addition of the 0.5 value forces the truncation of the number to mimic a round up-down function. Defining the parameter in this manner is fine if the parameter scalar is the identical throughout the code.

One drawback is that in order to determine the scaling of AccelDueToGravity in the code you have to refer back to its definition. This can get very cumbersome if you have equations with several parameters all "integerized" with different scalers. It also presents a maintenance headache long term.

When working with a fixed point integer processor I have found the following definitions for implementing scalar and fixed point adjustment macros useful. These eliminate the need for using "magic numbers" in equations or for defining the numbers explicitly.

Additionally, you get the advantage that the expression is calculated in the highest numerical precision of the compiler (typically as float or double), scaled to the specified LSB, rounded to the nearest integer and finally then cast to "type".

The following are macros that can be used for explicit pre-process constant calculations. The LSB must be -31< LSB <31. The macro name is derived from Scale 2 (p)os or (n)eg X which translates linguistically to scale is 2 to the power of positive or negative n where n is a numeric value. Below is an example of some of the macrodefinitions centered on 20.

#define    S2p6(x,t)      ((t)((x)*64.0+0.5))
#define    S2p5(x,t)      ((t)((x)*32.0+0.5))
#define    S2p4(x,t)      ((t)((x)*16.0+0.5))
#define    S2p3(x,t)      ((t)((x)*8.0+0.5))
#define    S2p2(x,t)      ((t)((x)*4.0+0.5))
#define    S2p1(x,t)      ((t)((x)*2.0+0.5))
#define    S2p0(x,t)      ((t)(0.5+(x)))
#define    S2n0(x,t)      ((t)(0.5+(x)))
#define    S2n1(x,t)      ((t)((x)/2.0+0.5))
#define    S2n2(x,t)      ((t)((x)/4.0+0.5))
#define    S2n3(x,t)      ((t)((x)/8.0+0.5))
#define    S2n4(x,t)      ((t)((x)/16.0+0.5))
#define    S2n5(x,t)      ((t)((x)/32.0+0.5))
#define    S2n6(x,t)      ((t)((x)/64.0+0.5))

The value for x is the number being scaled and t is the data type that the number will be cast to once the value is calculated by the preprocessor.

There are times in equations that you need to adjust fixed point position. This is usually done to convert the final result to a different power of 2 or to match the fixed point of intermediate results. As a basic example consider the following equation.

x = y + z;

In fixed point math it is logical that y and z are scaled to the same power of 2 in order for the addition to make any sense. If y was scaled to 24 and z is scaled to 210, the resultant x will contain a number. But what is the resultant scalar? It really doesn't make sense unless you intentionally coded the equation this way for job security. Bravoway to keep the jobs at home.

What I would do is to raise the scalar power of x to match that of the value of y, execute the equation and then adjust the scalar value to the required resultant scalar. What I just said is implemented by brute force in the following equation.

x = ( (y*(26)) + z ) / 26

The resultant scalar in this case the 24 since the divisor of 26 is used to adjust the final value before stuffing it into x. Although I did not show it here, there may be the need to explicitly cast some of the intermediate values to make sure that they fit in the data type.

There is a better way. The solution of course is to use macros. The following are macros used for fixed point adjustment constant multipliers. These change the power of the LSB of a variable.

EXAMPLE: SPApY(x) increases the power of the LSB of x by Y, SPAnY(x) decreases the power of the LSB of x by Y. These can be used on any numerical operand type.

#define      SAp4(x)     ((x)<<(4))
#define      SAp3(x)     ((x)<<(3))
#define      SAp2(x)     ((x)<<(2))
#define      SAp1(x)     ((x)<<(1))
#define      SAp0(x)     (x)
#define      SAn0(x)     (x)
#define      SAn1(x)     ((x)/2)
#define      SAn2(x)     ((x)/4)
#define      SAn3(x)     ((x)/8)
#define      SAn4(x)     ((x)/16)

When used in an equation, once you get used to them, they make the equation easier to maintain. For example:

out = SAn8((s32_t)in1 * S2p8(param1, signed int)+(s32_t)in2 * S2p8(param2, signed int));

The output scaling depends on the input scaling of in1 and in2. What can be seen is that both parameters, param1 and param2, are scaled to 28 and cast to a signed int type. The final result is fixed point adjusted by 8 places to the right. If you have to write this equation as many do you would get the following:

out = ((s32_t)in1 * (signed int)(param1*256)+(s32_t)in2 * (signed int)(param2*256))/256;

If written like this you may question the last numeric value of 256 in the equation. Is it used as a fixed point adjustment or is it some other constant that is used in the equation? In this case, the 256 values are magic numbers that litter the equation.

Using the macros for scaling takes some getting used to. But believe it or not, once you get the hang of it, it really does make the code easier to read and debug. You can recognize potential scaling errors without having to refer back to their definitions in external header file.

Dinu P. Madau is a Software Technical Fellow with Visteon. He has been developing software for embedded systems for over 22 years. He has an MSE in computer and electrical control systems engineering from Wayne State University and a BSE in computer engineering. Dinu has developed safety-critical software for anti-lock brakes, vehicle stability control, and suspension controls and is currently working in Advanced Cockpit Electronics and Driver Awareness Systems at Visteon developing systems leveraging vision and radar technologies. He can be reached by e-mail at dmadau@visteon.com.

• 03.08.2010

• 04.29.2008

• 01.05.2010

• 06.30.2008

• 12.03.2013