Line-fitting algorithms for exceptional cases-minimax line fitting - Embedded.com

Line-fitting algorithms for exceptional cases–minimax line fitting

In many embedded systems design applications, line-fitting techniques, particularly minimax algorithms, are used instead of the more well-known regression methods to “fit” noisy data. More often than not, minimax fitting is preferred to regression when the goal is to find a fitting line with minimum tolerance, such as some automatic control applications.

My first exposure to minimax line fitting came via an engineer who asked me to help him fit some sampled sensor data. He was getting unacceptable results with regression and thought that minimax might do better. The engineer told me that he wanted something quick and simple and that small code size was required, but that high performance was not a priority because the computations would be done off-line.

This article presents my own attempts to meet these requirements. It describes both useful strategies and pitfalls to avoid in tackling similar problems.

Regression does not always give the “best” linear fit

By far the most common way to fit noisy data with a line is to use simple regression. If the noisy data consists of points (xn ,yn ), n=1,…,N, then the regression equations give:

Y = M•X + B

where M and B are defined in terms of the accessory values ΣX , ΣY , ΣXX , and ΣXY as follows:

ΣX = the sum of the x-values = Σn xn

ΣY = the sum of the y-values = Σn yn

ΣXX = the sum of the x2 -values = Σn xn 2

ΣXY = the sum of the xy-values = Σn xn yn

M = (N•ΣXYY ΣX ) / (N•ΣXXX ΣX )

B = (ΣY – M•ΣX ) / N.

Nonetheless, there are many cases where regression is not the best option, for instance when:
• a minimax fit is required (in other words, one that minimizes the maximum deviation between line and data);
• the noise varies systematically in such a way that it affects the linear trend;

• the linear fitting is taking place real-time and is continually adjusted as new data comes in;
• there is a possible alias in the y-value (for instance with angle data, there may be an ambiguity of ±k •π or ±2k •π);

This article will discuss minimax fit. Additional articles, which will appear later this month on Embedded.com, will deal with the other cases.

Minimax fitting: polynomials and lines
Minimax fitting is preferred to regression when the goal is to find a fitting line with minimum tolerance . This makes it appropriate for some automatic control applications.

The difference between minimax and least-squares fitting is most apparent when the deviations are not symmetrical, as shown in Figure 1 . The regression line follows the bulk of the data, while the minimax line tries to “split the difference” between high and low deviations. In the case shown in Figure 1, the minimax line gives an error tolerance about 40% smaller than that achieved via regression.


Click on image to enlarge.

Libraries of downloadable algorithms
At the time I started working on the problem, I was not aware of a linear minimax fitting algorithm. However, I did know of a general method for polynomial minimax fitting called the Remez algorithm (polynomial fitting includes linear fitting as a special case). There are numerous references to the Remez algorithm on the web (including Wikipedia) and in numerical analysis textbooks.

With a little searching, I found that there is downloadable Remez code available from various Web libraries. Table 1 summarizes the results of my investigation. Unfortunately, all the codes in Table 1 are designed for general purpose and were far too large and complex for the specific application I wanted.


Click on image to enlarge.

My first thought was to try to simplify one the codes, based on the documentation provided. As shown in Table 1, only two libraries provided documentation of the algorithm, while the others gave references to the literature. I wanted to avoid digging through textbooks and journals if possible, so I started hopefully through the www.boost.org article on the Remez algorithm. My enthusiasm was somewhat dampened when I encountered several mathematical statements that I could easily prove were not true. My guess is that the article was written by an implementer who did not fully understand the algorithm–caveat lector!

The heart of the algorithm
The Mathworks documentation on the other hand I found to be accurate and very understandable, with clear, accurate, and illustrated with excellent figures. There I found the following pearl of wisdom:

Chebyshev … gave the criteria for a polynomial to be a minimax polynomial. Assuming that the given interval is [a , b ], Chebyshev's criteria states that if Pn (x ) is the minimax polynomial of degree n then there must be at least (n +2) points in this interval at which the error function attains the absolute maximum value with alternating sign.

Now here was something I could take to the bank! In the case of a linear fit (n =1), there had to be (at least) three data points where the maximum error was achieved (here we denote the three points as P 1 , P2 , and P3 where P j =(xj ,yj ) and x1 2 3 ). Furthermore, there were only two possibilities for these three points:

(A) P1 , P2 , and P3 lie above, below, and above the fitted line, respectively;
(B) P1 , P2 , and P3 lie below, above, and below the fitted line, respectively;

These two possibilities are shown in Figure 2 . Thinking geometrically about these two possibilities quickly led to several conclusions (shown in Figure 3 ):
• P1 and P3 must be consecutive points in the convex hull1 of all data points. This must be true because all data points either lie below the line P1 P3 –as in case (A)–or above the line P1 P3 , as in case (B).
• P2 must also be a vertex point on the convex hull of all data points. This holds because all data points lie either above (case A) or below (case B) the line through P2 that is parallel to P1 P3 .


Click on image to enlarge.

Click on image to enlarge.

• Segment P1 P3 must be parallel to the fitted line. This is necessarily true because P1 and P3 have the same error, hence the same vertical distance to the line.

These findings prompted the following straightforward prescription for an algorithm:

  1. Find all vertices of the convex hull of all data points. The vertices are divided into two sets: upper envelope (U) and lower envelope (L).
  2. For every pair of consecutive vertices in U (denote these vertices as (u1,v1) and (u2,v2)):

    2.1. For all vertices (x,y) in L with u1≤x ≤u2, find the vertex that has the largest vertical distance to the upper convex hull. This vertical distance is given by:

    D = x×(v2 – v1)/(u2 – u1) + (v1×u2 – v2×u1) / (u2 – u1) – y

  3. Select the upper vertices (u1,v1)+ , (u2,v2)+ and lower vertex (x,y)+ that corresponds to the overall largest vertical distance D+ found in 2.
  4. Repeat steps 2 and 3, switching the role of upper envelope (U) and lower envelope (L). Select the lower vertices (u1,v1) , (u2,v2) and upper vertex (x,y) that corresponds to the overall largest vertical distance D .
  5. a. If D+ D , then the minimax line is parallel to the segment [(u1,v1)+ , (u2,v2)+ ] and lies a vertical distance D+ /2 below the segment;
    b. If D+ < D , then the minimax line is parallel to the segment [(u1,v1) , (u2,v2) ] and lies a vertical distance D /2 above the segment;

The key remaining task was to obtain an algorithm for finding the convex hull. A Google search on “convex hull algorithm” quickly led to the “Graham scan” article on Wikipedia, which even contains a pseudocode. In the case at hand, the algorithm was made even easier by virtue of the fact that the data was sorted in order of increasing x-coordinate, and the x-coordinates were evenly spaced.

Postscript: More than one way to skin a cat …
As I was researching for this article, I discovered a similar (but not identical) algorithm in the technical literature [Rey and Ward, 1987].

Prototype code
The prototype code in Listing 1 is written in Octave (an open-source Matlab-compatible language). Caveat: No serious attempt has been made to optimize the code either for reduced code size or faster run-time.

Listing 1:


Click on image to enlarge.

Part 2 and 3 of this article will be online on Embedded.com in the near future.

Chris Thron is currently assistant professor of mathematics at Texas A&M University Central Texas, and does consulting in algorithm design, numerical analysis, system analysis and simulation, and statistical analysis. Previously Chris was with Freescale Semiconductor, doing R&D in cellular baseband processing, amplifier predistortion, internet security, and semiconductor device performance. He has six U.S. patents granted plus three published applications. His web page is www.tarleton.edu/faculty/ thron/, and he can be reached at .

Endnotes:
1. The convex hull of a set S is the smallest convex set containing S. Alternatively, it is the intersection of all half-planes that contain S.

References:
boost C++ libraries resources for minimax polynomial fitting:
• Minimax Approximations and the Remez Algorithm. Available from www.boost.org/doc/libs/1_36_0/libs/math/doc/sf_and_dist/html/math_toolkit/toolkit/internals2/minimax.html; accessed 18 March 2010 (describes code)
• The Remez Method. Available from www.boost.org/doc/libs/1_36_0/libs/math/doc/sf_and_dist/html/math_toolkit/backgrounders/remez.html; accessed 18 March 2010 (describes Remez algorithm)

Mathworks resources for minimax polynomial fitting:
• Tawfik, Sherif. Remez Algorithm, available for download at www.mathworks.com/matlabcentral/fileexchange/8094; accessed 18 March 2010. (includes Matlab code and .pdf documentation)

NAG library resources for minimax polynomial fitting:
• NAG Fortran Library Routine Document, E02ACF. Available from www.nag.co.uk/numeric/Fl/manual/pdf/E02/e02acf.pdf; accessed 18 March 2010.

• NAG Toolbox for Matlab e02ac. Available from www.nag.com/numeric/MB/manual_22_1/pdf/E02/e02ac.pdf; accessed 18 March 2010.

Netlib library resources for minimax polynomial fitting:
• ACM Algorithm #414, “Chebyshev Approximation of Continuous Functions by a Chebyshev System of Functions.” Available for download from http://www.netlib.no/netlib/toms/414; accessed 18 March 2010.

• Golub, G.H., Smith L.H. Chebyshev Approximation of Continuous Functions by a Chebyshev System of Functions. Available from ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/72/CS-TR-67-72.pdf (This report documents the Netlib Algol code)

• Rey, C. and Ward, R. “On Determining The On-Line Minimax Linear Fit To A Discrete Point Set In The Plane.” Information Processing Letters 24 (1987), 97-101.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.