Manipulating C strings safely - Embedded.com

Manipulating C strings safely

Here’s a simple mechanism to prevent software from manipulating an invalid string with well-known string functions.

A software failure due to an invalid string of characters is common in any software system, but it is a critical one in an embedded software written in C language because it frequently manipulates strings through specific functions defined in string.h, like strcpy() or strcat(), which are not safe at all. This article proposes a simple mechanism to prevent software from manipulating an invalid string with these well-known functions. In this context, invalid string means a sequence of characters without the null character ‘\0’ to mark the end.

Functions like strcpy() or strcat() are able to cause a fatal failure on the software system such as an unintentional memory overwrite, when at least one of its arguments, which is expected to be a string of characters, is not properly terminated. In general, when the software crashes it is very hard to find the cause of this kind of failure because it does not manifest immediately.

Let’s suppose the following code fragment, where frame is an array of 12 characters and str is a pointer to char. The length of str should be less than 4 characters, without including the terminating null character itself. This code calls strcat() to append str, which is retrieved from Geo_getLatitude(), to frame.

char frame[12], *str;

str = Geo_getLatitude(position);
frame = strcat(frame, str);

Before executing the function strcat(), frame contains the string “@078,”.

The pointer str points to a memory location, an array of characters, that looks as shown below. Note the length of the string str is larger than expected, because it should contain the null character where the character ‘$’ is located instead.

In this context, executing strcat() causes a software failure in a quiet way, because the string “0.08$Ti4” was effectively appended to the variable frame but overwriting its consecutives memory locations, shaded in red, which had not been allocated for the variable frame.

To avoid this situation this article proposes a simple function which is able to detect an invalid string of characters, i.e. one does not meet the maximum expected length, before using it as an argument of any function defined in string.h.

In the following code fragment, the function SecString_strchk() checks whether or not the length of string str is larger than expected, i.e. MAX_LENGHTOF_STR characters. If that is the case, then SecString_strchk() returns false, str is dropped and the string empty is used instead.

static const char empty[] = "????";
...
res = SecString_strchk(str, MAX_LENGTHOF_STR);
frame = strcat(frame, (res == false) ? empty: str);

SecString_strchk() and SecString_strnlen() are very simple functions, whose source code, written in C language, and its unit test cases are available in the X repository. The function SecString_strnlen() is a safer version of strlen() that returns the length of the passed string s but it has a safety limit to reach when s is larger than expected.

bool
SecString_strchk(char *s, size_t num)
{
    char * pos = memchr(s, '\0', num);
    return (pos == (char *)0) ? false: true;
}
size_t
SecString_strnlen(char *s, size_t maxlen)
{
    char *pos = memchr(s, '\0', maxlen);
    return (pos != (char *)0) ? (pos - s): maxlen;
}

The following code fragments show a more detailed example of using SecString_strchk().  When the Geo_getLatitude() function is called, it invokes getAttribute() that checks the length of the string me->latitude ensuring it does not exceed the LATITUDE_LENGTH + 1 characters. If so, it calls the errorHandler() function to notify this situation and returns a null pointer, otherwise it retrieves the string me->latitude.

/** \file Geo.c */
#include "SecString.h"
...
static char *
getAttribute(char *attribute, size_t bufSize)
{
    char *pos = (char *)0;
    bool res;
    res = SecString_strchk(attribute, bufSize);
    if (res == false)
    {
        if (errorHandler != (GeoErrorHandler)0)
        {
            errorHandler(INDEX_OUT_OF_RANGE);
        }
    }
    else
    {
        pos = attribute;
    }
    return pos;
}
char *
Geo_getLatitude(Geo *const me)
{
    return getAttribute(me->latitude, LATITUDE_LENGTH + 1);
}
#include "Geo.h"
...
static const char empty[] = "????";
...
value = Geo_getLatitude(position);
frame = strcat(frame, (value == (char *)0) ? empty: value);

The introduced method could be applied to such software systems written in C language that manipulates string variables that could come from files, non-volatile memory devices, messages of communication protocols or other sources which cannot ensure the integrity of these strings.


Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.